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Abstract 

Analytical learning is a set of machine-learning techniques for revising the rep- 
resentation of a theory based on a small set of examples of that theory. When the 
representation of the theory is correct and complete but perhaps inefficient, an im- 
portant objective of such analysis is to improve the computational efficiency of the 
representation. 

Several algorithms with this purpose have been suggested, most of which are closely 
tied to a first-order logical language and are variants of goal regression, such as the 
familiar explanation-based generalization (EBG) procedure. But because predicate 
calculus is a poor representation for some domains, we would like to extend these 
learning algorithms to apply to other computational models. 

In this paper we show that the goal-regression technique applies to a large family of 
progra mmin g languages, all based on the notion of term-rewriting systems. Included 
in this family are three language families of importance to artificial intelligence: logic 
programming (such as Prolog); lambda calculus (such as LISP); and combinator-based 
languages (such as FP). We also exhibit a new analytical learning algorithm, AL-2, 
that learns from success but is otherwise quite different from EBG. 

These results suggest that term-rewriting systems are a good framework for analytical- 
learning research in general, and that further research should be invested in finding 
new learning techniques in the framework. 

Introduction 

Analytical learning, including the various methods collectively known as explanation- 
based learning (EBL), is motivated by the observation that much of human learning derives 
from studying a very small set of examples ( ^explanations ) in the context of a large knowl- 
edge store. EBL algorithms may be partitioned into those that use explanatory examples to 
modify a deficient theory and those that rework a complete and correct theory into a more 
useful form. Among the latter are algorithms, such as the familiar EBG algorithm [23, 15], 
that learn from success, and other algorithms (e.g., [19, 26]) that learn from failure. 

The EBG algorithm changes certain constants in the explanation to variables in such 
a way that similar instances may then be solved in one step without having to repeat the 
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search for a solution. For example, consider this simple logic program for integer addition, 
in which plus(a, b y c ) is intended to mean a + b=c and s(a) indicates a + 1: 

plus{ 0,xi,xi) : — true. (*) 

plus(s(x 2 ),x 3 ,s(x 4 )) plus(x 2 ,x 3 ,x A ). ( « ) 

With this program and the instance pius(s(0), 0, -s(O)), the EBG algorithm finds the new 
rule, plus(s(0),z,s(z)) true, by analyzing the proof and changing certain occurrences of 
the constant 0 to a variable z. Subsequently, the new instance pius(s(0), s(0), s(s(0))) can 
be solved in one step using the new rule, instead of the two steps required by the original 
program, provided the program can decide quickly that the new rule is the appropriate one 
for solving this new instance. 

The results from applying this technique alone have been a bit disappointing. Among 
tlie reasons identified in the literature are the following. 

• The generalizations tend to be rather weak. Indeed, the longer the proof— and thus 
the more information in the example— the fewer new examples are covered by the 
generalization. 

• Many reasonable and useful generalizations (e.g*, in the example above, the rule 
pfos(z,i»(0),jj(z)) true) are not available using this method alone. 

• Over time, as more rules are derived, simple schemes for incorporating these rules into 
the program eventually degrade the performance of the program, instead of improving 
it. The program spends most of its time finding the appropriate rule. 

Other issues also need to be raised. While EBG is often described as a domain- 
independent technique for generalizing explanations” [24], it is not a language-independent 
technique. Virtually all variants of the algorithm depend on a first-order logical language, 
in which ter ms can be replaced by variables to obtain a more general rule. Even when the 
algorithm is coded in, say, Lisp, one represents the rules in predicate calculus and simulates 
a first-order theorem prover. Yet domains arise in practice for which predicate calculus is 
at best an awkward representation for the essential domain properties [22, 24]. In these 
situations the ability to use another language and still be able to apply analytical learning 
algorithms would be highly desirable. 

Is EBG, then, just a syntactical trick that depends on logic for its existence. If so, its 
status as a bona fide learning method is questionable, since important learning phenomena 
ought not to depend upon a particular programming language. If EBG is not dependent 
on logic, then how do we port EBG directly to other languages? For example, in a typical 
functional language the plus program might be coded: 

plus x y := if x = 0 then y 

else 8 ( plus z y), where x = a (z). 
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Given the input plus «(0) 0, this program computes s(0) as output. Surely an EBG algorithm 
for this language should be able to generalize this example such that the input plus s(0) y 
produces s(y), without first translating to a logical representation. 

Also, while the formal foundations of EBL have been studied (e.g., [11, 28, 27, 6]), most 
of this work has abstracted away the generalization process in order to model the benefits 
of path compression. Notable exceptions include [4], where a notion of correctness is defined 
for EBG and an EBG algorithm is proved correct, and [8], where EBG is treated as a higher- 
order process (since it handles programs as objects), and where modal logic is introduced to 
distinguish tentative, non-operational constructs from permanent, operational ones. 

Aside from this work, presentations of the EBG algorithm in the AI literature have gener- 
ally been informal, and often incomplete. The elegant PROLOG-EBG algorithm [15] is a case 
in point. In certain cases it will overgeneralize. For example, given the instance plus( 0, 0, 0) 
and the plus program above, it produces the overgeneralization plus(x,y,z) true. Re- 
cently several papers, a thesis, and even a textbook have reproduced this algorithm without 
noticing or correcting the problem. All of this points to the need for more rigorous presen- 
tations of analytical- learning algorithms and a consistent framework for such presentations. 

This paper addresses both these issues: 

• Language: We show that the EBG algorithm is a special case of an algorithm that we 
call AL-1. We present this algorithm formally in a framework based on term- rewriting 
systems (TRS), a formalism that includes, as special cases, logic programming, lambda 
calculus, applicative languages, and other languages. 

• Correctness: In this formalism, the correctness, power, and limitations of the algorithm 
can be carefully studied. P roofs then apply immediately to each of the languages 
mentioned above. 

In addition, by separating the mechanics of generalization from other issues that are more 
language dependent, the TRS formalisms help to clarify the fundamental learning problems. 

To show that the TRS framework is also useful for formulating new analytical-learning 
algorithms, we describe a new algorithm, called AL-2. Like EBG, the algorithm learns from 
success while preserving the semantics, and uses the proof of the example to propose new 
rules for potential inclusion in the knowledge base. And like EBG, each new rule may have 
the effect of improving or degrading the average performance of the program, depending on 
what problem instances occur subsequently. Unlike EBG, the language in which the rules 
are expressed is modified. New symbols may be introduced in order to abbreviate frequently 
occurring terms and to shorten common sub-proofs. This resembles what humans do, for 
example, when we say “EBG” instead of “explanation-based generalization or prime 
instead of “natural number divisible only by itself and one”. Thus AL-2 joins EBG as 
another technique for learning from success, and can be added to a growing list of analytical 
learning methods (e.g., [19, 24, 26, 29]). As the toolbox for analytical learning expands, it 
becomes more important to abstract the algorithms away from specific domains, to formalize 
the procedures, and to characterize their properties. The TRS framework facilitates this task. 
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The style in which the results in this paper are presented is mathematical, since the main 
objective is formally to extend the EBG algorithm. The NASA research project under which 
this research has been conducted, however, is developing practical algorithms for Machine 
Learning. The applications we envision for algorithms of the type considered here include 
the case wherein a correct program or expert system learns from experience to reduce the 
amount of computation and the size of internal storage required to solve “typical problem 
instances, without ever compromising correctness. Experience has shown that application 
programs often expend most of their resources repeating a rather small number of essentially 
identical steps. Effective methods for indexing and rearranging partial computations may, 
therefore, offer significant returns in overall performance. 

The contents of this paper are as follows. We first develop a family of typed-term languages 
that possesses a lattice structure suited to the kind of generalization and specialization 
procedures needed for analytical learning. Based on these we define the class of term- 
rewriting systems that serve as computational models. The AL-1 and AL-2 algorithms 
are expressed in this framework and are accompanied by theorems that characterize their 
behavior. Along the way, we shall compare our framework to related work on term rewriting 
and unification theory. 


Typed-Term Languages 

The algorithms we shall develop operate on symbolic (syntactical) expressions that we call 
terms. The set of all admissible terms is a formal language generated by a special class of 
context-free grammars. The non-terminals of this grammar determine the types (also called 
sorts) assigned to each term; we therefore call these languages typed-term languages. In this 
section we define these languages and give examples. In the following section we develop a 
lattice structure over these terms, so that we can use the meet and join operations m our 

algorithms. 

Notation. Familiarity with the basic concepts and conventions of formal language theory 
is assumed. Throughout this paper we use the symbol e to designate the empty string. 
Concatenation is denoted by ■ or simply by juxtaposition. If A is a set of symbols, A 
denotes the Kleene closure of A under finite concatenation, and A + = A’ - {e}. N denotes 
the set of natural numbers. I 

Definition 1 A typed-term grammar (ttg) is an unambiguous, context-free grammar (A f,A, 
V,G°), with the following special form: 

• The set A f of non-terminal symbols is divided into two subsets: a set of genera/ types^ 
denoted {(7°, G 1 ,.. .}, and a set of special types , denoted {S\ S 3 , . . .}• G° is the “start” 
symbol for the grammar. 

• The set A of terminal symbols is likewise divided into subsets. There is a finite set 
of constants, {*,c 2 , . . . ,<*}, and for each general type <? there is a countable set of 

variables, denoted {x\,x \, . . .}. 
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• The set V of productions satisfies three conditions: (1) For any non-terminal N , the set 
of sentences generated by N is non-empty and does not contain the empty string. (2) 
For any non-terminal N, the right-hand sides of all its productions (N — > ai . . . a^) 
have the same length (or arity) k j\r. For general types this length is one. (3) For each 
general type G' and each variable x'- of that type, there is a production G' — ► x’-. No 
other productions contain variables. I 

Without loss of generality we can assume that all useless symbols and productions have 
been removed from the grammar. We often refer to the non-terminal symbols in a typed- 
term grammar as types. The set of strings that can be generated from the non-terminal N 
is denoted C(N) and described as a typed-term language (ttl) of type N. Note that a string 
may have one or more types. 

Also note that, according to our our terminology, variables (i.e., the symbols x’) are 
terminal symbols. Since some texts describe the non-terminal symbols of a context-free 
grammar as variables, there is potential for confusion. In our terminology, variables are 
distinguished classes of terminal symbols that may occur in the strings generated by the 
grammar. Each countable set of such variables is associated with its own general type. 
Thus G° can generate the variables x°, x°, . . . , G 1 can generate xj, x\, ... , and so forth. 
Variables in our terms play much the same role as universally quantified, bound variables in 
the formulas of first-order logic. The set of variables in a term r is denoted V(r). The term 
is called ground if V(r) is empty. 

Below we shall give three examples of ttg’s generating, respectively, the terms of a logic 
programming language (LP), a simple applicative progra mm ing language (AP), and a lambda 
calculus-based programming language (LC). These three languages will serve as running 
examples throughout the presentation. 

Example 2 [LP] The grammar below generates a class of goals appropriate to the logic 
program plus in the introduction. Accordingly we call our principal type Goal rather than 
G°. There is one other general type, to which we assign the non-terminal Term (in preference 
to G l ) , and several special types ( Formula, Conjunction , and Terml). This language has 
constant symbols plus , true, s, 0, A, comma, and two parentheses. The variables g, (for 
i > 0) are generated by Goal, while the variables x; are generated by Term. (Only the latter 
set of variables are used in conventional logic programming.) 


Goal 

-► 

Formula 

Goal 

-► 

Conjunction 

Goal 

— > 

true 

Goal 

-V 

g 4 (for i > 1) 

Formula 

— » 

plus ( Term , Term , Term ) 

Conjunction 


A ( Goal, Goal) 

Term 

-> 

0 

Term 

-► 

Xi (for » > 1) 

Term 

— ► 

Terml 

Terml 

-*• 

s ( Term ) 
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Examples of goals generated by this language are plus(s(0),x 17 ,0) and g u . A con- 
junctive pair of goals is represented in this language by a parenthesized conjunction, e.g., 
A (pfos(0,0,0), pMi(0),0,i(0)) ). Strings that are not goals include zj, s(si,x 2 ), pitis(), 
and plus( true, 0, 0). I 

Example 3 [AP] The grammar below generates the class of terms of a simple functional 
applicative programming language. Such languages are based on combinatory logic [12] and 
capture the basic notions of functional application. In this example, the language has one 
general type, which we shall write Expression instead of G° . In addition it has one special 
type (Application), four constant symbols (plus, succ, open-paren, and close-paren) and a 
countable set of variables (*,•). The start symbol is Expression. The expression ( n r 2 ), 
where Tj and r 2 are arbitrary terms, is intended to indicate that the function denoted by T\ 
is to be applied to the argument r 2 . 


Expression 

Expression 

Expression 

Expression 

Expression 

Application 


Application 

0 

succ 

plus 

Xi (for * > 1) 

( Expression Expression) 


Examples of terms generated by this grammar are: plus, z 6 , (plus (succ z 6 )) and ((plus succ) z 5 ) . 
Note that the arity of the non-terminal Application is four, while that of Expression is (nec- 
essarily) one, I 

Example 4 [LC] The grammar below generates the class of terms of a lambda calculus- 
based programming language. This language has two general types: Expression and Lambda- 
param, whose respective variables are labeled x; and Vi. The type Lambda-param is unusual in 
that variables are the only strings of that type. Expressionl is a special type. The constants 
of the language are A, period, open-paren, close-paren, plus, and others whose utility will 
become apparent in subsequent examples. 


Expression — ► 
Expression — ► 
Expression — ► 
Expression — » 
Expressionl — » 
Expressionl — ► 
Lambda-param — ► 


Expressionl 

Lambda-param 

plus | succ | zero ? \ second 

Xi (for i > 1) 

A Lambda-param. Expression 
( Expression Expression ) 

Vi (for i > 1) 


Examples of terms generated by this grammar are: plus, x$, (plus (succ x^)), and 
At>7 . (plus (v 2 V7)). I 
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In the remainder of this section and in the next, we extend to ttl’s such operations as 
substitution, replacement, and unification familiar from first-order unification theory. For 
this purpose, we need some notation. Fix a typed-term grammar (ttg), let N be any type, 
and let r be a term of type N. The unique parse tree whose root is labeled N and whose 
yield (i.e., the string obtained by concatenating the labels on the leaves in order from left to 
right) is r, is called the parse tree of r and is written tree(r). The yield of a parse tree Y is 
denoted yield(Y). Thus yield (free(r)) = r. 

In this paper we shall adopt a specific data structure for parse trees. Let Y be a parse 
tree whose root is labeled r. 


• If Y is a leaf, it is represented simply by its label, r. 

• Otherwise, let Yi, . . . ,1* be the (representations of the) immediate subtrees of Y ; then 
Y is represented: 

tY, 


Since ttg’s are unambiguous by definition, and since each non-terminal has a fixed arity, this 
representation is efficient for constructing parse trees from terms and for determining the 
yield of a parse tree. 

Definition 5 To each node of a tree Y we assign a unique string of integers, called a location , 
as follows: 


• The location of the root of Y is e; 

• Let Y\,. . . ,Yk be the immediate descendents of a node whose location is u>\ then for 
1 < * < • Jfe, the location of Yi is i • w. 

For brevity the dot (•) will often be omitted when confusion is unlikely, e.g., 12 instead of 

1 - 2 . 


Example 6 [AP] Refer to the grammar above for the AP language. The parse tree for the 
term r = ( plus ( x x 0)) is 

tree(r) = Expression Application ( Expression plus 

Expression Application ( Expression x x Expression 0 ) ) 

The location of the subtree Expression 0 is 1 • 3 ■ 1 • 3; the location of the subtree Oisl-3-1-3-1. 

I 


Definition 7 Let t x be a term of type B x and r 2 , a term of type B 2 in a ttg. We say that 
T 2 occurs in t x at location u> if, at location u> in tree(r x ) , there is a node labeled B 2 and the 
yield of the subtree rooted at this node is r 2 . 

The set of term, occurrences in r is the set of locations of nodes in the parse of r that are 
labeled with any type N. This set will be written fl(r). 

The term that occurs at location to in r is denoted r[o»]. 
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Example 8 [AP] In Example 6, fi(r) = {e, 1, 12, 13, 141, 1312, 1313}. The subterm ( x a 0) 
of type Expression occurs at location 13; the same subterm, but of type Application, occurs 
at location 131. I 

Definition 9 ( Replacement ) Let Tj be a term of type B\ and r 2 , a term of type B 2 . Let 
it) € fl(ri) be the location of a subterm of type B 2 within Tj. The string t[u) <— t 2 ] is obtained 
by replacing the term in t 2 at location u> by r 2 . 

Example 10 [AP] Refer again to Example 6. If r 2 = ( plus ( x 2 0)) and r 2 = succ , then 
Ti[13 «— t 2 ] = ( plus succ). I 

Definition 11 (Substitution) With respect to a ttg, a substitution is a mapping 9: V -» A + 
from the set of variables V to non-empty strings of terminals A + such that (1) for all but 
fini tely many variables x, 9(x) = x, and (2) if 9(x'j) — r, then r € C(G*). That is, a 
substitution changes only finitely many variables and maps variables only to terms of the 
same type. The set of variables x such that 9(x) ^ x is called the domain of 9, written 
dom(0). Its codomain is the set of terms 0(dom(0)). 

Since the grammar is unambiguous, the sets C(N) are freely generated for each N [10]; 
hence there is a unique morphic extension 9 of 9 from V to the domain of arbitrary strings 
in *4*: 

• 9(e) = e. 

• 9(x) = 9(x) for all variables x. 

• 9(t) = t for all terminal symbols r except variables; and 

• 9(t 1 • t 2 ) = 9(ti) • 9(t 2 ) for any strings r x and r 2 in A + . 

Henceforth we shall not distinguish between 9 and its extension 9. 

Lemma 12 For any type AT in a ttg, the language C(N) is closed under replacement and 
substitution. Specifically, if N and N' are types, r 6 £(AT), T> € C(N'), and u) is the 
location of a term of type N' within r, then the string r[o» <— r'] obtained from r and r' by 
replacement is itself a term in C(N). 

Likewise, for any substitution 9 and string t £ £(N), the string 9(t) is itself a term in 

cm 

(The easy proof consists in showing that the term obtained by replacement or substitution 
is still generated by the grammar starting from the non-terminal N .) 

To summarize, we have defined a class of languages generated by typed-term grammars, 
and defined the notions of substitution for variables and replacement of a subterm at a 
specific location. Whereas substitution is purely a string operation, replacement requires 
reference to the parse tree in order to identify the subterm at the given location. Never- 
theless, these notions are quite similar to the corresponding operations for first-order terms. 


9 


One of the most useful features of first-order terms is that they form a lattice under the sub- 
sumption ordering. The meet and join operations of this lattice reflect the semantic notions 
of specialization and generalization, respectively. In the next section, we develop a similar 
algebraic structure for the expressions of a typed-term algebra. 

The Subsumption Lattice of Terms 

In this section we order terms according to generality and develop a lattice structure 
over the set of strings generated by general terms. Much of this is based on the well-known 
theory of first-order terms, so proofs are sketched except where our formalism is substantially 
different. 

Throughout this section we assume that the typed-term grammar Q = {N , A , G ° , V) has 
been fixed. Let B denote an arbitrary non-terminal symbol in the grammar. 

Definition 13 Let _L (“bottom”) be a special symbol not found in the grammar Q. T(B) 
is the set { J_} U C(B) — that is, the set of terms generated from the non-terminal B together 
with the special term _L. Similarly, let T(B) be the set {J_} U {tree(r) | r € C(B)}. 

Definition 14 The binary subsumption relation □ on T (B) is defined as follows: 

• r □-!_ for all r € T(B); 

• For Ti,Ti € £(!?), T\ □ r 2 iff there exists a substitution 0 such that 0{r x ) = t 2 . 

If both Ti □ t 2 and r 2 3 n, then we say that t x and r 2 are variants, and write r x = r 2 ; 
otherwise we write t x □ t 2 . C is the inverse of 3 : T i 3 T J iff r 2 3 r i- Similarly, C is the 
inverse of □. 

= is an equivalence relation on T(B). 3 is a quasi-ordering (a reflexive and transitive 
relation) but not a partial ordering; for example, xj □ x\ and x\ □ zj. 

Definition 15 If a substitution 6 is a bijective mapping from A + to A + , then we call 0 a 
permutation. 

Lemma 16 Two terms t x and r 2 6 C{B) are variants iff there exists a permutation 0 such 
that 0(ri) = r 2 . 

PROOF: If t x and r 2 are variants, then by Definition 14 there exist substitutions 0 and such 
that 0(7i) = r 2 and V»(t 2 ) = t x . For each variable x occurring in t x , i{>(0(x)) = x; thus <?(z) 
must be a variable. If x and y are distinct variables occurring in t x , then i>{0(x)) ± ip{0(y)), 
and thus ^(x) ^ 0(y). We may thus take as the required permutation a substitution O' such 
that 0'{x) = 0{x) and 0'(0{x)) = x for all variables x occurring in t x , and ^'(x) = x for 
variables occuring in neither r x nor r 2 . 

The opposite demonstration, that t x and t 2 are variants if there exists a permutation 0 
such that 0{r x ) = t 2 , is accomplished by setting V> = 0~ l and noting that V’( r 2 ) = T i- I 
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Although substitution was defined on terms (Definition 11), there is an obvious parallel 
operation on parse trees, one that was implicitly used in Lemma 12. Whereas with terms 
one replaces a variable x{ by a term r in C(G j ), with trees one replaces the subtree G 3 • x 3 
by the subtree free(r) whose root is labeled G j . Thus over parse trees a substitution 9 is a 
mapping from tree variables (that is, trees of the form G 3 * x{ for some j ) to trees with the 
same root G 3 such that 9 is the identity mapping for almost all tree variables. As usual, 9 

extends morphically to a mapping on T(B). 

With substitution on T(B) as the basis, we can define the subsumption ordering □ and 
equivalence with respect to that ordering (=) entirely analogously to Definition 14. The 
result corresponding to Lemma 16 also holds. 

Lemma 17 Let m be the mapping from T(B) to T(B) such that m(r) is the parse tree 
for r whose root is B , and m(_L) =1. Then m is an order-isomorphism between T(B) and 
T(B). That is, m is a bijection that preserves the ordering: 

r a □ r 2 iff m(ri) □ m(r 2 ). 


The trivial proof is based on the fact that the grammar is unambiguous and that X is a 
distinguished symbol. 

Definition 18 T(B)/= is the set of equivalence classes of T(B) with respect to the relation 
= on T(B). Similarly T{B)/= is the set of equivalence classes of T{B ) with respect to the 
corresponding equivalence = on 7 ~(B). [t] denotes the =-equivalence class of which t is a 
member. 

The purpose of this section is to argue that T(B)/= is a meet semilattice lattice for every 
type B , and a complete lattice for every general type G'. The idea is to inject T(B) into a 
lattice of first-order terms so as to preserve meets and joins. 7 

Before doing so, however, let us review some results from unification theory. Recall the 
definition of a family FOT of first order terms. Let T be a countable set of function symbols 
each with a fixed arity, and let V be a countable set of variables such that T fl V = 0. 
The set FOT of first-order terms is the smallest set containing V and the nullary functions 
(constants) in F and closed under functional application, i.e., F Ti ... t„, where F € F is a 
function symbol of anty n > 0 and € FOT for each i. Vfith variants in FOT taken to be 
equivalent, the set FOT U {1} partially ordered by subsumption is a complete lattice, with 
effective algorithms for join (U) and meet (n) [14, 30]. The meet operation is computed using 
a unification algorithm, since by a well-known theorem of Robinson, any finite, unifiable set 
of first-order terms has a most-general unifier that is unique modulo variants. 

Since our terms are typed, the first-order theory does not apply directly, but the unifica- 
tion theory of many-sorted terms has also been studied [32]. Briefly, there is a set of sorts; for 
each sort there is a countable set of variables and a finite set of constants; and each function 
symbol f of k arguments is assigned a string Wi — Wfc+i indicating that the i th argument 
has sort Wi (for 1 < * < k) and the result has sort W k +i - There is also a unification theorem 
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similar to Robinson’s for the well-formed terms generated by these symbols. This assumes, 
however, that variables of one sort do not unify with variables of another sort. When one 
sort may be a subsort of another, the theory is more complex. For example, if x is a variable 
of type real and y is a variable of type integer, then we can unify the terms i and y by 
substituting y for x, but not conversely. Walther [32] shows that, when sorts are partially 
ordered, two unifiable terms may have zero, one, or many most general unifiers (mgu's). He 
further shows that a necessary and sufficient condition for a Robinsonian property (existence 
of a unique mgu) is that this partial order among sorts be a forest. 

Since unification is how we propose to implement our meet operations, we likewise seek 
a Robinsonian property to apply to our terms. Moreover, interpreting nonterminals in a ttg 
as “sorts”, we see that the various sorts are related, in that if N\ — * A^, then N 2 is a subsort 
of Ni. However, the ordering is not a forest, and while it may be possible, we have not found 
a way to map our use of types onto a sort hierarchy that is a forest. We shall, however, 
obtain a Robinsonian property, indicating that our notion of types is somewhat different. 
This difference is briefly characterized in an appendix to this report. 

The Meet Operation 

Notice that if we view non -terminals as function symbols and relate constants and vari- 
ables in the obvious way, parse trees look very much like first-order terms. Indeed, having 
established an order-isomorphism between the strings T(I?) and their parse trees T(B), we 
are tempted to establish a lattice isomorphism between T(B)/= and the corresponding set 
of first-order terms. Unfortunately, this is not possible, because there are many first-order 
terms (for example, B x l where B is a special type) that correspond to no parse tree. But 
as it turns out, T(B)/= is isomorphic to a sub-semilattice of FOT/=, and to a complete 
sublattice when B is a general type G*. 

To establish this correspondence requires some work, but having done so, we shall have, 
as a consequence of Lemma 17, that T(H)/= is a semilattice (ordered by subsumption) and 
that T(G*')/= is a complete lattice. The particular result we need for our algorithms is that 
(apart from variants) there is a unique meet (greatest lower bound), r a n t 2 , for any two 
terms T\ and t 2 , and an effective algorithm for computing it. Also, a theorem characterizing 
the AL-1 algorithm will be based on fact that 7”(G°)/= is a complete lattice. 

Example 19 [AP] A brief example will quickly illustrate how we compute the meet of two 
parse trees. Refer to Example 6, where the parse tree for the term t = ( plus ( xi 0)) is given. 
Suppose we wish to find the greatest lower bound between this and the term t’ = ( plus x 2 ). 
The parse tree for t' is 

Expression Application ( Expression plus Expression x 2 ) . 

Treating Expression as a unary and Application as a 4-ary function symbol and plus and 
0 as constants, we unify the parse trees for r and r' with the usual first-order unification 
algorithm. The resulting substitution replaces x 2 by the tree: Application ( Expression x 2 
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Expression 0 ). The corresponding substitution for the two terms is then {*2 := ( z i 0)}- The 
purpose of using the parse trees to construct the substitutution is to ensure that subterms 
are unified with other subterms of the appropriate type. I 

We define a particular family of first-order terms, FOT. The function symbols m the first- 
order language consist of the types M in the ttg, with the same arities as in the grammar. 
Constants A and variables V in the grammar act as constants and variables, respectively, in 
FOT. Let FOTb Q FOT be the subset that contains only variables and terms whose leftmost 
function symbol is B. Given the well-known lattice properties of FOT (when ordered by 
subsumption), one can readily show that FOT B is a lattice when variants are taken to be 
equivalent and a unique smallest element J_ is adjoined. We denote this lattice by FOTb/=- 
Note that x\ = x\ in FOT, but not in T . 

Next we establish a straightforward mapping p from T into FOT, as follows. Trees of the 
form NC i . . . Cfc, where the &’s are constants, map to the identical first-order term N( i . . . Cfc- 
For any general type G { and variable xj, p{G { x)) = x). That is, tree variables map to first- 
order variables. p(±) =-L. Recursively, we map Nt x ...r k to Nt[ ...r k , where for each j, 

1 <j<k, 

, _ J Tj if Tj is a constant 
T i ~ 1 p(rj) if Tj is a parse tree 

Lemma 20 For each type B, the mapping p:f(B) — ► FOTb is an injection and preserves 
the ordering □: if T\ □ r 2 then p(t\) 3 /*( T 2)- I 

Recall the following definitions for first-order terms. A unifier for a pair of terms r i,r 2 
is a substitution 9 such that 9(n) = $( r 3 ). A unifier 9 for n and r 2 is a most general unifier 
{mgu) if, for any other unifier 9' of r, and r 2 , 9(n) □ 9'(n). The binary operation n on 
FOT is defined as follows: 

1. If tj or r 2 is _L, then n n t 2 =1. 

2. If any variable occurs in both n and r 2 , then let t[ be a variant of n such that t[ and 
t 2 share no variables; Tj n r 2 = n r 2 . 

3. If T\ and t 2 are not unifiable, then tj n r 2 = J_. 

4. Else let 9 be a mgu of Ti and r 2 ; t x l"l t 2 = 9(t j) = ^(t 2 ). 

On F0T/=, the operation n is defined: [n] n [r 2 ] = [n nr 2 ]; this is well defined, since n nr 2 

is unique up to variants. ^ ..... . 

A similar definition could be given directly for trees over T (based on the subsumption 
ordering □ for trees), but it is convenient simply to refer to the corresponding operations on 
FOT. This is possible be cause is dosedufldeiO: . 

iThe n is not strictly an “operation” because the variant t [ (item 2 below) is not uniquely defined. It is 
an operation on FOT/=, however. 
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Lemma 21 Let t x and r 2 be trees in T(B). There exists an element r € T{B) (either a 
tree or X), unique up to variants, such that fi(r) = p{r x ) l"l /r(r 2 ). I 

Definition 22 Let t x and r 2 be arbitrary elements of T(B). Their meet is given by 

T\ n t 2 = ^ - 1 (/ i ( T i) n ;r(r 2 )). 

On f(£)/ H , [rt] n [r 2 ] = [n n r 2 ]. 

Theorem 23 T(B)/=, partially ordered by =, is a meet semilattice whose minimum element 
is X and whose meet operation is effectively computable by n. 

PROOF: By Lemma 20, p is an order-isomorphism between T{B) and it image p(T(B)) 
under p. Clearly p(n) = p(r 2 ) iff n = r 2 . By Lemma 21, [r,] n [r 2 ] exists for any pair of 
trees in T(B)/=, and is a greatest lower bound of [ti] and [r 2 ] by the corresponding property 
for first- order terms. I 

The Join Operation 

T(B)/= is not a lattice because there may not exist any tree t* U t 2 that subsumes both 
T\ and t 2 . For example, if B is a special type, we cannot join B c x and B c 2 if c x and c 2 are 
distinct constants, because there is no variable of type B. For a general type G ^ however, 
G'x\ subsumes both G' c x and G' c 2 . This is the intuition behind the fact that T(G‘)/= is 
a lattice. However we cannot define U so easily as we did fl — simply by mapping over to 
first-order terms — because subtrees may not join. For example, when we join G x B c x and 
G* B C 2 as trees, we cannot simply join the two subtrees B Cj and then attach the result to 
a root labeled G\ as we would for the corresponding first-order terms. 

Definition 24 We define the binary operation 7i Ut 2 on T(G j ) as follows: 

• If Ti =X then TiUtj = t 2 . 

• If t 2 =X then Tj U r 2 = t x . 

• If any variable occurs in both t x and r 2 , then let t[ be a variant of t x such that t{ and 
t 2 share no variables; r x U r 2 = t[ U t 2 . 

• Otherwise t 1 Ut 2 = sup(T!,r 2 ) where sup is computed by the algorithm in Figure 1. 


Lemma 25 For any t x ,t 2 € T(G'), r = t x L) t x is a least common generalization of t x and 
r 2 . That is, r □ t x , r □ r 2 , and for any r' such that r' □ r x and t’ □ r 2 , r' □ r. 


For every general type G j , let <p G i be an arbitrary injection from all pairs of trees in 
f(G i ) - {-L} to the tree variables {G j x{, G j x {, . . .}. fail is a new symbol unique to 

this algorithm. 

Algorithm 1 sup(r l ,T 2 ): 


Input : A pair (tj,t 2 ) of parse trees such that no variable 

occurring in r x occurs in r 2 . 

Output: A tree, or fail. 

Procedure: 


Case: 

1. Tj or t 2 is a tree of the form Bcj, . . . , c*, where B is a special type and the s 
are constants: if t x = t 2 , then return t x . Else return fail. 

2. Ti or t 2 is a tree of the form G'a, where a is a constant or a variable: if both r x 
and t 2 have root G l , then return <p G - ( T i ; r 2 )- Else return fail. 

3. Otherwise, let r x = RiU \, . . . , 17*, and r 2 = R 2 U ?, . . . ,Uj^, where the C/’s are 
subtrees or constants. 

Case: 

3.1 Ri = R 2 = G\ where G ’ is a general type (and hence fc x = fc 2 = 1): 

3.11 If t/j 1 = U\, then return tj. 

3.12 Else if U\ and U? are both trees and if sup(Ul,U?) ^ fail, then return 
G i • sup(Ul,Uf). 

3.13 Else return <f> G i(Ti,r 2 ). 

3.2 Ri = R 2 = B, where B is a special type (and hence A: x = k 2 ): 

3.21 For all j, 1 < j < fc x , let ^ if Uj = Uj , or sup(f/l, Uj) if ^ and 
Uj are both trees, or fail otherwise. 

3.22 If Vj ± fail for all j, 1 < j < k u then return B-V u ...V kl . Else return 
fail. 

3.3 Otherwise, return fail. 


Figure 1: The sup algorithm. r 

PROOF: Observe first that U is an operation on 7~(G') when computing t x U t 2 according 
to Definition 24, the result is in T, and is never fail. 
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In the case where either T\ or t 2 is J_, the proof is trivial. Otherwise, let us define the 
depth of a parse tree r to be 0 for tree constants (Bci,...,c k ) and tree variables (G‘x‘), 
and 1 + max{!<j<fc} depth(rj) when r = Bri,...,T fc . The proof is by induction on d = 
min(dep£h(ri), depthfa))- 

If either T\ or T 2 has depth zero (cases 1 and 2 in the sup algorithm), it is easy to see 
that sup returns fail if the two trees have no common generalization, and a least common 
generalization otherwise. 

For inductive purposes, assume that for every pair of trees r( and r' 2 at least one of which 
has depth no greater them d, sup(r 1 , r 2 ) returns a least common generalization if one exists, 
or fail otherwise. Suppose, without loss of generality, that T\ has depth d + 1 and that r 2 
has depth at least d + 1. Let Ti = RiUj , . . . , U k . ( i = 1,2) as in step 3. If Ri ^ i? 2 , there 
is clearly no common generalization, and the algorithm correctly returns fail (step 3.3). If 
Ri = R 2 , then necessarily = k 2 = k according to the arity requirements of the grammar. 

Consider cases 3.1 and 3.2 where each pair of Uj's ( j = 1,2) is an identical pair or 
one having a common generalization. Let B = R\ = Ri, and r = B • U \, . . . ,U k) where 
Uj = Uj for an identical pair or sup(fJl, Uj) otherwise. We argue that r is a common 
generalization of Tj and r 2 . By the inductive hypothesis, Uj is a least common generalization 
of Uj and Uj] hence there are unifiers 0j and 0j (for 1 < j < k) such that 9){Uj) = Uj. 
Let 6 1 = 9\ o . . . o 0), the composition of all the 0) ’s, and similarly for 0 2 . We claim that 
( r ) = n and 0 2 (r) = r 2 . To see this, suppose a variable GPx? occurs in two or more of 
the Uj' s. Since <f>Gr is an injection, the two pairs of terms that gave rise to must have 
been identical. Thus where the domains of the substitutions 0] (for 1 < j < k) agree, their 
codomains also agree, i.e., the same variables are mapped to the same values. Hence 

0 1 (b - Uj,...,Uk) = B-e\(u l ) y ...,e l k (vd 

= T\, 

and, similarly, 9 2 (t) = r 2 . r is thus a common generalization of ri and r 2 . Let r' be another 
common generalization. Either B , the root of t^, is a general type G' and t = G x T for 
some variable x£, or t' — BVi , . . . , for some subtrees Vj (1 ^ j < k). lu the former case, 

it is clear that r' □ r. In the latter, we know that Vj □ Uj for each j, by the inductive 
hypothesis, and, again, t' 3 t, It follows that t is a least common generalization. 

If, in cases 3.1 and 3.2, there is some j such that sup(Uj , Uj ) = fail, then by the 
inductive hypothesis the Uj ' s have no common generalization. Thus there is no term r' = 
BUj, ...,Uk such that r' □ r x and r □ r 2 . In case 3.2, sup correctly returns fail. In case 
3.1, however, where B is a general type G ' , there is a generalization of the form G x r , and 
such a generalization is returned by sup. Any other generalization r' must also be a tree 
variable of type G l , whereupon r' □ r. 

Thus the inductive hypothesis holds for depth d+ 1 as well, and the proof is complete. I 

As with the meet, we define [n] U [r 2 ] to be [t* U t 2 ], which is easily shown to be well 
defined on T (G')/=. 

Lemma 25, together with Theorem 23, gives us: 
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Theorem 26 T(<j‘)/=, partially ordered by □ is a lattice whose meet (n) and join (U) 
operations are effectively computable. 

To argue that T(G‘)/= is a complete lattice, we note that the ordering C is Noetherian: 
for any [r], there exist only finite chains 

He [ ti ] C...CJ4 

Thus every subset of f (G ,- )/= has a maximum element in f (G*)/=, and completeness then 
follows from basic lattice theory (e.g., [7, Chap. 1]) . 

To summarize the main result of this section: 

Theorem 27 Let B be a non-terminal symbol of a typed-term grammar. With the adjunc- 
tion of a unique least element I, the set C(B) of terms generated by B, modulo equivalence 
under variants (=) and partially ordered by subsumption (□), is a meet semilattice. For a 
general type G\ £(G*‘) is a complete lattice. Finally, the meet and join operations on terms 
are effectively computable. . . 

Non-deterministic Term-Rewriting Systems 

We now define a class of term-rewriting systems over a typed-term algebra. TRS’s are 
an active research area of theoretical computer science and have already been applied to 
machine learning (e g., [18, 17]). Mooney [24] has applied them to analytical learning as an 
alternative to predicate logic. See [3] for a recent survey of general research on TRS’s. For 
our purposes, a TRS enables us to express our learning algorithms in a form applicable to 
many computational models, including logic p ro g ramming and lambda calculus. 

The term- rewriting systems that we shall use are non-deterministic in that, of all rewrite 
rules that may be applicable at any stage of the computation, the system always chooses a 
rule that ultimately leads to a successful computation, unless no such rule exists. In effect, the 
assumption of non-determinism abstracts away all of the backtracking search that occurs in 
an actual, deterministic system. This is appropria te, since our analytical learning algorithms 
learn from the fruits of a successful search. Further, it is by focusing on non-deterministic 
term-rewriting systems that we are able to express our analytical learning algorithms in a 
general form applicable to many computational models. These models differ widely with 
regard to the mechanisms available for removing non-determinism. Hence we would lose this 
generality if we focused only on deterministic models. 

We define a non-deterministic, typed-term rewriting system (NTTRS) as follows. Starting 
with a typed-term language (including the subsumption (□) and meet (n) relations on the 
terms of that language), we add a rewriting relation, as follows: 

• C(G°), the set of terms generated from the principal general type G°, is interpreted as 
a set of configurations (or states) of the system. 
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• 3? is a recursive set of rewrite rules ( rules for short) of the form (a, /?) , where both a and 
(3 are in C(B) for some type B. B is called the type of the rule. A rule may have more 
that one type. 3? is closed under substitution: for any substitution 8, (0(a), m) €* 

if (a, (3) € 3?. 

• A rewriting step (or step) is a binary relation, written =>-, on £(G°). Ti => r 2 iff: 

- n,T2 <E £((7°); 

— (a,/ 3) £ 3? is a rule (let B be the rule type); 

— u; is a position in Ti such that Tj[ lj] is a subterm of type B and 7i[u>] = a; 

- r 2 = Ti[w «- P\. 

More succintly, we rewrite a configuration tj by finding a type-B occurrence of a in T\ and 
replacing that subterm by (3 . By Lemma 12, the resulting term r 2 is also a configuration. 
=$•* is the reflexive, transitive closure of =^. 

A configuration to which no rules can be applied is said to be irreducible. The general 
theory of term- rewriting systems (TRS’s) deals with such issues as the existence and unique- 
ness, for each configuration r, of an irreducible form t' such that r t', but these issues 
are beyond the scope of our concerns. A difference between our definition of TRS’s and one 
that is often used in the literature is that we do not require that V (/3) (the set of variables 
occurring in /?) be contained in V(a). Logic programming is an example of a TRS where 
rules may introduce new variables on the righthand side of a rule. 

Example 28 [LP] Refer back to the ttg for logic programming (Example 2) and to the 
simple program for addition (plus) in the introduction. A configuration is a goal, possibly 
conjunctive. This also includes the goal true and goal variables such as g\ . The rewrite rules 
are the Horn clauses. For example, the first rule, 

plus(0, *i,xi) :— true (i) 

can be viewed as a schema of rules in which goals of the form plus( 0, r, r) (where r is any 
sentence of type Term) can be rewritten to the goal true. 

The configuration A(p/us(s(0), 0, s(0)), true) can be rewritten by applying rule ( ii ) to 
the subterm plus(s(0), 0, s(0)). More precisely, the rule we are applying is rule (ii) in which 
the value 0 has been substituted for each of the variables x 2 , x 3 , and x 4 . By closure under 
substitution, this is also a rule.* 

After rewriting, we have the new configuration: 

A(pfus(0, 0, 0), true). 

2 The process of instantiating the left side a of a rule so as to match a subterm in the goal and then applying 
the resulting, instantiated rule to the configuration is often called demodulation , to make a procedural 
distinction from rewriting. 
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This new configuration can in turn be rewritten by applying rule (t) (with x x = 0), yielding 

A (true, true). 

At this point the configuration is irreducible. I 

Example 29 [AP] Refer back to the ttg for the simple applicative language in Example 2. 
Configurations are any sentence of type Expression. Apart from the syntax, rewrite rules 
resemble the Curried functional patterns used in such programming languages as FL, ML, 
and Haskell [13]. To emphasize this similarity, we shall write rules in the form a = (3, instead 
of (a, /3). For example, a program for addition similar to the one discussed in the preceding 
example is as follows: 

((plus 0) xi)) = zi (*) 

{(plus (succ x 2 )) z 3 ) = (succ ((plus z 2 ) z 3 )) (it) 

In this language, zero is represented by the constant 0, and the successor of a number n 
is represented by (succ n). 

Using the two rules above and their instantiations, we obtain the following sequence of 
rewrites: 

((plus (succ 0)) 0) => (succ ((plus 0) 0)) =£• ( succ 0). 

Although the AP term rewriting system is completely different from that of LP , one can 
see that the program for plus is essentially the same as the one in Example 28, and that 
there is a direct correspondence between rules in the two systems. I 

Example 30 [LC] A configuration in our lambda- calculus language is any term of type 
Expression. The rewrite rules fall into two groups. The first group contains all rules of the 
form: 

((Xv.Q) R) =* (R/v}Q, 

where Q and R are configurations and \Rjv\Q is the result of substituting R for the free 
occurrences of v, according to the standard rules for /3-reductions: 

• [fZ/v]v = f2; 

• [i2/v]t>i = v x if v x ^ v; 

• [R/v)(E F) - (\Rjv)E [R/v]F ); 

• \R/v\Xv.E = A v.E] 

• [ R/v]Xvi.E = Xvi\Rjv}E if v x ^ v, and either t>i does not occur free 3 in R or v does 
not occur free in £; 


3 free-, outside the scope of a Xv%. 
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• [R/v]\vi.E = \v 2 .[RIv\[viIvt]E if vi ^ v, vi occurs free in R, and v occurs free in E. 
(v 2 is a fresh variable occurring in neither R nor E.) 

For example, 

((Avi.(v a Zl)) X 2 ) =» ( X 2 Xi) 

is a rule in the first group. 

The second gToup of rewrites — the group that constitutes the actual “program” — consists 
of a list of name-expression pairs: {/, (some expression)). Such a rule indicates that an 
occurrence of the name / in the configuration can be replaced by the associated expression. 
Often called a ^-reduction, this is also a popular way to implement recursion in programming 
languages (like Lisp), since the replacing expression may also contain the name /. (Fixpoint 
combinators are another way to define recursion.) 

To illustrate, let us recode the plus program from the preceding example in LC. “Zero” 
(0) is encoded by the expression Av . v. We represent ordered pairs [11,12] of objects zi and 
Z2 as 

[Z!,Z 2 ] = AVi .((V! Zi) z 2 ). 

The integer “one” is represented by [a, 0 ], “two” by [s, [a, 0 ]], etc., where s is an abbreviation 
for the expression Av1.Av2.v2. The successor (succ t) of an integer t is computed by the 
function 

succ =► Av . [s, v] 

Let Avi.Av 2 .Vi and Av1.Av2.V2 represent true and false , respectively. A predicate zero? 
that tests whether an integer is zero, giving true if so and false if not, is as follows: 


zero ? =>• Avi . (vi (Av2 . AV3 . V2)). 

One can check that (zero? 0 ) true and (zero? (succ z)) =►* false. 

We also need a predicate that extracts the second member of a pair: 

second =*► Avi . (vi (Av2 . Ava . V3)). 

The program for integer addition consists of the rewrite rules for succ, zero ?, and second 
above, and the following rule for plus: 

plus =$■ Avi . Av 2 . (((zero? Vi)v 2 ) (succ ((plus (second V\)) V2))). 

plus begins by applying zero? to its first argument. If the result is true, the true expres- 
sion selects the second of the two arguments, V2. If false, the result is the successor of 
(plus (second v x ) v 2 ), that is, plus is applied recursively. 

Once again, although the definition of the plus program in lambda- calculus is completely 
different from the logic programming and the applicative versions, the structures of all three 
plus programs are quite similar. I 
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Definition 31 A computation in a NTTRS is a finite sequence of configuration-position-rule 
triples, 

[T lj wi,{a 1 ,j9i)] 1 ...,[r n ,&^,(atn,/3n>],[T«+i,*,*] l 1 ) 

where, for 1 < i < n, an instance of the rule (a,, /?, } applied to t,- at location ui, yields 
(* indicates “don’t care”). The path of the computation consists of just the locations and 
the rules (ignoring the configurations). 

Note that, for a given rule (a,f3) and subterm r[ u>], there is a unique instance of the rule 
that rewrites the subterm, namely, (0(a), 9 ((3)), where 9(a) = r[ u>]. This is a consequence 
of Theorem 27. 

A path is said to be maximally general if each rule of the path is maximally general. 
That is, if (cn,(3i) is the i’th rule in the path, there is no rule (a',/?') of which ( a <,/?<) is a 
substitution instance. In this paper, “path” will always be taken to mean “maximally-general 
path” . 

Example 32 [LP] Consider once again the simple logic program for plus. When the initial 
configuration is the goal plus(s(0), s(0), s(s(0))), we obtain the following computation. 

[plus(s(0),s(0),«(s(0))),e,(«)l =► [phis(0,s(0),s(0)),e,(i)] =» [true,*,*]. 

In each step the position is e, so the path of this computation is [e, («)] =>[e, (*)]• 1 

Example 33 [AP] The plus program of Example 29 gives the following computation corre- 
sponding to 1 4- 1 =►" 2. 

[(plus (succ 0)) (succ 0)), e, («)] 

[(succ ((plus 0) (succ 0))), 1 ■ 3, (i)] ^ 

[(succ (succ 0)), *,*].. 

Example 34 [LC] Following is a portion of a computation corresponding to 1 + 1 =*►’ 2, 
using the program in Example 30. In the sequence, successive config uration s are shown. 
To save space, multiple steps have been combined. To the right of each configuration is an 
indication of what rules were applied: [f3 + ] refers to one or more ^-rewrites, [plus] signifies a 
substitution for the name plus, etc. The subterm that is rewritten is underlined. 


(( plus [s,0]) [s,0])) [plus] 

((At>i . Aug ■ (((zero?Vi)v 2 ) ( succ ( (plus ( second t>i))t>a))) [s,0]) [a,0]) [/? + ] 

((zero^ls^O]) [s, 0]) (succ ((plus (second [s, 0])) [s,0])) [zero?,/? + ] 

(false [s, 0]) ( succ ((plus ( second [s, 0]) ) [s,0])) [/? + ] 

(succ ((plus (second[s^ 0]) ) [s,0])) [second, /5 + ] 


( succ (( plus 0) [5, 0])) 

. . . (several steps omitted here) 
(succ [3,0]) 


[plus] 
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[/? + ] 

[s,[s,0]] 

Thus over this path we find that ( plus [s,0]) [s,0]) =►* [a, [a,0]], as expected. I 

In the three preceding examples, the plus programs are quite similar, both in the way they 
represent naturals and in the recursive definition of the plus function. The paths, however, 
are not all similar. Whereas the LP and AP paths for “1 + 1 = 2” are easily comparable, 
the LC path is very different, since the TRS rules are quite distinctive. In the next two 
sections we present two analytical learning algorithms, AL-1 and AL-2. The question we 
should anticipate is this: will the results of the learning algorithms be comparable in all three 
TRS’s, or will the results be difficult to relate, especially in the LC system vis a vis the other 
twol The answer to this question gives us insight into the nature of the learned information. 
For, if the learned structures are similar in the LP and AP cases but not in the LC case, 
then what is learned pertains more to the path of the computation than to the semantics of 
what it is computing. Conversly, if the learned information is similar in all three languages, 
the learning algorithm is acquiring semantic concepts rather than syntactical or operational 
ones. 


The AL-1 Algorithm 

If, in Example 32, our initial configuration had been p/tts(s(0), s(s(0)), s(s(s(0)))) or 
p/us(s(0), 0, s(0)), the total computation would have followed the identical path. It is not 
difficult to see that p/ns(s(0),xi,s(xi)) is the most general goal whose computation follows 
this path. In some sense, once we have proved the latter goal, we get all the former goals 
almost “for free”, since they are just instances. This is the idea behind the well-known 
algorithm known as EBG or goal regression. We shall formalize this process for NTTR.S s 
and call the resulting algorithm AL-1. The new name is justified, since new considerations 
arise in the more general setting of TRS’s that are irrelevant to the special case of logic 
programming, as the next example illustrates. 

Given a program and a path for that program, we should like to determine the most 
general configuration that can be rewritten using that path. Unfortunately, no such config- 
uration exists, in general. The next example shows why. 

Example 35 [LC] For purposes of this example let us modify the grammar in Example 4 
by replacing the fifth production rule as follows: 


Expression 1 — > A Lambda-param Expression. 
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Algorithm 2 (Stretch) 

Input : Configurations Tj and T 2 , with Tj □ T 2 ; 

A position u> € n(T 2 ). 

Output: A configuration Tj 2 such that tj □ ri 2 3 r 2 and u> € n(r 12 )- 
Procedure: 

1. If u € fl(Tj), then return Tj. 

2. Else let u> be the longest prefix of w such that u> € 0(ti ) and the node at location u> is 
labeled by a general type, say GK 

Remark: The same general type G* necessarily occurs at position ui of the parse of 

T 2 , and tj[u>] = z* for some i. See text. 

3. Compute Expander 2 ,u>). 

Remark: Expand computes a replacement for the term at position u> € ft(Tj), consist- 
ing of an appropriate term of the same type. See Lemma 37. 

4. Let 6 be the substitution that maps z{ to Expand^!, u) and maps other variables to 
themselves; let r' := 0(tj ) . 

5. Return tj 2 = Stretcher' ,T 2 ,u>). 

Figure 2: The Stretch algorithm. 

(The dot separating the parameter from the expression has been omitted.) Now consider 
the following two non-unifiable LC configurations: 

Ti : A vj ( A V 2 V 2 xi ) 

7"2 : ( SUCC ( A t>2 ^2 3l ) ) 

The underlined expression, ( A t >2 V 2 *1 ), can be rewritten with a /3-reduction rule to ij. 
Moreover, in both r a and r 2 , this term occurs at the same l ocat ion, u = 1-3. Thus, in each 
configuration, the same path of length one can be used to rewrite this underlined lambda 
expression to x x . But what is the most general expression such that this rule can be applied 
at position 1 • 3? By the sup algorithm, T\ U Tj = 1 , where x is a fresh variable of type 
Expression. Because position u> does not occur in the term x, the path used to reduce and 
t 2 does not apply to their least generalization. Hence no configuration that subsumes both 
Tj and T 2 contains the redex (\v 2 V 2 Xi) at position u = 1 * 3, and there is no most-general 
configuration to which the path applies. I 

If we cannot look for the most general configuration that can be rewritten along a given 
path, we can instead determine the most general configuration that both is rewritable along 
the path and subsumes To, where To is an explanation (or example ) a ground configuration 
together with a computation along the path. 
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The AL-1 algorithm does just this for a general NTTRS. Before presenting it, however, 
we need to describe a procedure, Stretch, that plays an essential role in the AL-1 algorithm. 
Stretch takes two configurations T\ and r 2 , where tj □ r 2 , and a position € fi(Ta)> a-nd 
returns a maximal configuration Ti 2 such that Ti □ r 12 3 r 2 and u> € fl(T i2 ). Later we shall 
argue that this configuration is unique up to variants. 

If a; € fl(ri) then we can simply let r 12 = T\ and stop. Otherwise let u> be the longest 
prefix of the integer string u> such that u exists as a location in Tj and such that the node 
at location Q in the parse of Tj is a non-terminal, say, B. Such an u> surely exists, since, if 
nothing else, the empty string e is a prefix of u>, and the node at location e is labeled by 
the start symbol G°. The corresponding node in the parse of r 2 must be labeled with the 
same non-terminal B , since T\ □ r 2 . Furthermore the tree rooted at location u in the parse 
of T\ must be a tree variable the form G*x\ for some j , since for any other tree, either u> is 
not the longest matching prefix of «, or w = u>, or t 2 2 r i-> contrary to assumption. Let 
G * — ► B\ be the production corresponding to the node at location u> in r 2 . We replace the 
configuration Tj by a new configuration r[ whose parse tree is derived from that of T\ by 
replacing each occurrence of the subtree G J • x{ (including the one at u>) by the tree G J • B\. 
With a non-terminal leaf node B\ , this is, of course, an incomplete parse tree. But a simple 
algorithm can be invoked to expand B\ to its unique most general parse subtree in C(B\) 
while preserving the 3 relation to the parse of t 2 . Having thus “stretched’ the term tj to 
Tj, we repeat the procedure until the resulting configuration contains the required location 

U). 

The algorithm Stretch is given in detail in Figure 2; the subroutine Expand is given in 
Figure 3. 

Example 36 [LC] We illustrate the result of applying Stretch to two terms and a given 
position, following the steps in the Stretch and Expand procedures. Let Tj = * 1 , a maximal 
lambda term. Let r 2 = Xvi.(Xv 2 .v 2 x 7 ) and w = 1 4 (the term at this position in r 2 is 
underlined). The parse tree, tree(rj), is: Expression x 2 . The tree, tree(r 2 ), is: 

Expression Expressionl X Lambda-param v 2 . Expression Expression 1 ( etc.) . 

u> does not occur in Ti, and the maximal prefix u> is e. We therefore replace Tj by the term 
t[ with the incomplete parse tree, Expression Expressionl. (This is the result of step 1 in 
Expand .) Expressionl is a special type, so we expand it (Expand, step 2.3) in such a way as 
to unify with the parse tree for r 2 : 

Expression Expressionl A LamMa-parom . Expression 

Since the underlined elements in this tree are non-terminal leaf nodes, this tree is still 
incomplete. We therefore continue expanding them by recursively calling Expand, with the 
result: 

Expression Expressionl X LambdMyparam ti 3 . Expression i 2 . 

Here, step 2.2 has been applied; t> 3 is a fresh variable of type Lambda-param, and x 2 is a 
fresh variable of type Expression. 
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The resulting configuration, Au 3 .i 2 , now has a term, x 2 , at position w = 1 4; hence this 
term is output as r 12 , and the Stretch process is complete. Note that n □ r a2 D r 2 , as 
required, I 

The next two technical lemmas characterize the relevant properties of Stretch. 

Lemma 37 Let r be any configuration and u> a location in r such that t[u>] € £(B). 
Expand(r, u>) returns a most general term r' satisfying these conditions: (1) r' € C(B); (2) 
T > □ t [cD]; and (3) for alii € N, if the node at location & • i in tree(r) is labeled by a type 
or a constant (anything other than a variable), then the same label is assigned to the node 
at location i in tree(r'). That is, th e tree r ; matches the structure of r[u>], except that some 
subtrees of r[u>] may be replaced in t' by tree variables. 

Note: Another way to state this is that 

[Expand(r,i !>)] = |_|{[t'] | r'satisfies conditions (1) - (3)}. 

Although C(B)/= is not a lattice, the conditions ensure that this join exists. 

PROOF: By induction on the height h of r[u>] . For h = 1, the parse of t[u>] is either B -C\ . . . 
where each Cj is some constant, or possibly G* • x{ if B is the general type In the former 
case, step 2.1 applies, providing a term r' such that t' = r. In the latter, step 2.2 provides 
a term ij consisting of a fresh variable. Either way, the requirements of the lemma are 
satisfied. 

Inductively, for h > 1 the production in step 1 is of the form B — » £i . . . Cfc, where at least 
one of the is a non-terminal. (Note that pr oduction s of the form G J — * B' are covered by 
this case.) Proceding again by cases, for those £ that are constants, step 2.1 applies. For 
(i = B\ a special type, case 2.3 applies: Expand is called recursively, and by induction, the 
result is a term of type B\ maximally general while remaining □ r[u> • »]. (i cannot be a 
variable, so the only remaining case is where (« is a general type G*. Instead of expanding G r 
further, Expand supplies the most general term of the same type— namely, a fresh variable— 
so as to fulfill the maximum-generality requirement. Finally, the results of expansion of all 
these & are concatenated in step 3 into t\ a most general term that is 3 T \&\ that preserves 
types at the level below B. I 

Lemma 38 Using the notation of Figure 2, Stretchy, t 2 ,w) computes a most general con- 
figuration t 22 such that □ t 22 □ T 2> w € fl(fi 2 )- 

Note: Another way to state this is that 

[Stretch(ri , r 2 , u>)] = |J{[ti 2 ] | □ r 12 □ r 2 and u € fl(ri 2 )}. 

Since the r, are configurations — and hence terms in £(£7°) — the join exists, by Theorem 27. 

PROOF: If the algorithm exits in step 1, the proof is trivial. Otherwise, let u> be the proper 
prefix of u> found in step 2 of the initial call to the algorithm. Since is a term of type 
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Algorithm 8a ( Expand ) 

Input : A configuration r; 

A position u> € fl(r) such that a non-terminal B occurs at position u> 
in the parse of r. 

Output : A term in C(B). 

Procedure: 

1. Let B — ♦ Ci • ■ • Ck he the production in the grammar that corresponds to the 
node at position Q in the parse of t. Initialize: Pi :—e for all i, 1 ^ i ^ k. 

2. For each i from 1 to k, do: 

Case: 

2.1 Ci is a constant c: Pi := c. 

2.2 Ci i* a general type G 1, or a variable x r y. Pi := x^, where x^ is a fresh 
variable unused in any expression so far in this or any calling routine. 

2.3 Ci i® any other non-terminal B': Pi := Expander , u> • *). 

3. Return the term Pi • . . . • P*. 


Figure 3: The Expand algorithm. 
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Qi for some j, Q • 1 is also a prefix of w. As noted in the discussion preceding the algorithm, 
u> is in fact the maximum prefix of u> for which there is a term in T\ at location d> of any 
type B\ for if B were not a general type G j , then either there would be a non-terminal at 
location u> • 1 of free(r a ) (contradicting the maximality of &) or there would be a terminal 
symbol at that location and we would have Q = u (contradicting the assumption that w is 
a proper prefix of w). Therefore, the term Tjfw] is a variable, which we’ll call x\. Moreover, 
the node at location u> • 1 in free(r 2 ) is labeled by a non-terminal, B\. 

Let us analyze the configuration t' = 9{r i) obtained in step 3 after substituting for 
the result of the call to Expand. By Lemma 37, this term, r'[u>], is a most general term in 
C(G*) such that t'[u>] □ t 2 [u>] and the term at location u> • 1 in t' is at least as general as 
the corresponding term at location u> • 1 in r 2 . Hence r'[u> • 1] is a term of type B u and so 
ri[u>] □ r'[u ;] □ r 2 [w]. Let 9 be the substitution that maps x\ to t'[ u>] and other variables 
to themselves; then n □ ^(r x ) = r' □ r 2 . t' is then passed as the first argument to the 
recursive invocation of Stretch in step 4. 

In successive calls to the Stretch algorithm (step 4), let t[, Tj, ... be the sequence of 
configurations that are passed as the first argument. In particular, r{ = r l5 and if the 
algorithm halts, the last term in this sequence is the final output. But we have just seen 
that this sequence of configurations is monotone decreasing with respect to the subsumption 
ordering □; an d the sequence of prefixes u> of u> in step 2 increases in length by at least one 
on each successive call. Thus this sequence of calls must terminate. And since, by Lemma 
37, each configuration Tfc +1 in this sequence is maximally general for those terms having this 
sequence of positions, the final output has the required properties. I 

With these preliminaries we can now develop the AL-1 algorithm. The algorithm applies 
to any NTTRS, but the system and language are “built in”, i.e., are not input parameters 
to the algorithm. For this reason the algorithm is actually an algorithm schema, to be 
instantiated for any particular TRS. Note that, for expository purposes, the formal version 
of the algorithm in Figure 4 contains many more variables than are really required. 

Input to the algorithm is a ground computation of length n > 1. The output from AL-1 
is a new rewrite rule (ct,/3), valid in the sense that a 0 according to the existing rewrite 
rules #. Informally, this rule is sufficient to accomplish in a single rewriting step what the 
original ground computation achieved in n steps, and moreover is the most general such rule 
for the path. What one might use this new rule for is outside the scope of the algorithm, but 
the rule can be viewed as a “chunk” or “macro-operator”, potentially useful for making the 
program more efficient. Such considerations belong to the deterministic computation model 
(and as such we shall discuss them later). 

The procedure is quite simple. Let 

[v 1 , , (c*!, 0l)]t • • . , [^V*) ( a f») 0nj\i [^n+l i *> *] 

be the computation, and let A\ and B\ be program variables, each with an initial value of 
x®, a fresh variable of type G°. For each step i in the path, we shall apply substitutions to 
aI and rewrite rules to Bi at the same positions as are applied to the example configuration 
Ti in the computation. The resulting rule will be (A n +i, B n + 1 ). 
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Algorithm 8 (AL-J2 

Input: A ground computation [ti,u>i, (aj,/?i)], . . . , [r n , u;„, (a„, /?„)], [r n+ i, *, *] 

of length n. (We may assume that no two rules in the path have variables in common.) 

Output: A rule (ot, /?) . 

Procedure: 

1. Initialize: Ax := x°, a fresh variable of type G°. B\ := x°. The algorithm uses the 
additional variables A,, AAi, Bi, BBi, ifri t and for 1 < t < (n 4- 1). 

2. For i := 1, . . . ,n: 

2.1 BBi'.— Stretch(Bi,Ti,u>i). 

2.2 xl>i'.= mgu(BBi, Bi). 

2.3 AAi:=iPi{Ai). 

2.4 6{ := mgu(BB i[u>i], ai) . 

2.5 I?i+i := 6i(BBi\<jJi «— /?<]). /* Apply the most general instance of the rule 
(aj,/?j) to BBi at location u>i */ 

2.6 Ai^:=6i{AAi). 

3. Output (a,/?) = {An + i,B n+ i). 


Figure 4: The AL-1 algorithm. 

Suppose we want to rewrite B\ using the same rule (a x ,fix) and position u>i as in the first 
step of the computation path. Since position may not exist in B Xt we must first stretch it 
so that, if necessary, it acquires a subterm at position wj while remaining as general as Tj. 
Let BB X be the result of stretching B x . Let V»i be the substitution such that ip x (Bi) = BB X . 
Since 7 i[wi] is an instance of Oj and BB X [ u>i] 3 h follows that BB i[u>j] unifies with 

(but may not be an instance of) <*i: 

n ai □ tj [wi] n«i= 

Let 6i be a most general unifier of and ai; we rewrite BB\ to B x {BB\\u)\ +— /?ij), and 

call this term B 2 4 . A 2 , in turn, is obtained by applying the same substitutions V»i and to 
A x that were necessary in order to rewrite B\. This process then repeats for the remaining 
steps of the computation. 


4 In effect we are paramodulating BB\. 
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Theorem 39 Refer to the notation of Figure 4, in which t x is the initial ground configuration 
of the computation, and (a, /3) the resulting rewrite rule. Let tt denote the path of the 
computation. Then a is the most general configuration □ r x for which n is a valid path, and 
is the configuration that results from rewriting a along the path it. Equivalently, 

a = |_|{t € £{G°) | r □ n and there exists a computation using 7r starting from r }. 

Proof: The proof of this theorem is not deep, but it does involve quite a lot of bookkeeping. 
Therefore we include only enough detail (hopefully) to convince the reader of the claims. 
We shall continue using the notation of Figure 4. Note especially that the ‘ variables Ai, 
Bi, AAi, BBi, and so forth are assigned only once during the algorithm, so that we can refer 
to their values unambiguously in the proof. 

The proof is by induction on the length n of the path, with the following inductive 
hypothesis H(k ): 

For all i < k, 

1. There exists a substitution & such that = T i> 

2. There exists a substitution r]i such that rn(Bi) = t*; 

3. For all variables v G dom(&) D dom(Tfc), C*( v ) = 

4. =>* Bi via a path consisting of the first t — 1 steps of the path 7r of the 
input computation. 

If this holds for all fc, then it follows that the algorithm outputs a rewrite rule (A n+x ,B n+x ) 
such that j4„+i =$■* B n + 1 over the n-step path ir. That j4„+i is the most general such 
configuration follows directly from Lemma 38 and the fact that the t/\ s and 0 % s in the 
algorithm are most-general unifiers. 

The basis (n = 1) corresponds the the initial situation (A x = x\ = B x ) and a path of 
length zero (no rewrites). It is clear that the substitutions C( z i) = v( x i ) = T i satisfy the 
hypothesis, and that A x “rewrites” to B x trivially over the empty path. Note .also that A x 
is a most general configuration. : 

We now assume H{i) and show that H{i -I- 1) holds. Ai becomes Ai+ X by first computing 
AAi = ipi{Ai) and then applying 6i to AAi. Similarly, Bi becomes B i+X by first computing 
BBi = i)>i(Bi), and then rewriting BBi using the t’th rule in the path. 

We first argue that the four properties of the hypothesis are also true of A A, and BBi 
(i.e., they are preserved by the substitution ip x )- To see this, note first that, by Lemma 38, 
Bi 3 BBi 3 verifying property (2). " 

Let T)' be the substitution such that r?'(B£i) = r<. To verify property (1) for AA ix we 
construct a substitution (' such that = T i- For this Purpose, partition the variables 

V(j 4,-) in Ai into two groups: those that are in dom('^i) and those that are not. If v belongs 
to the latter, then 6(f) must be a ground term, and for each occurrence of v in Ai, C<(v) 
matches this ground term at the same position in r,. The stretching process replaces some 
variables by terms containing fresh variables only, so no new occurrences of this variable v 
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will be created in AAi by applying the substitution V’.- Therefore, by defining ('(v) = C«'( v )> 
we preserve property (1) for the subset of variables not in dom(V\)- 

For property (3), if v occurs in Bj, then by hypothesis £(v) = rj { (v), a ground term, and 
since v is not in dom^i), = v'( v ) = Also, if v does not occur in B{, (3) holds 

vacuously. 

Now consider a variable u £ dom(^j). ^(u) is a term containing only fresh variables. The 
substitution t]' maps each of these fresh variables to the ground term at the corresponding 
location in Tj, so rji(u) = 7 ]'(ip(u)). But &(u) = Tfi(u) by property (3), so if (' is defined so as 
to agree with t}' on these fresh variables, again we will retain both properties (1) and (3). 

Thus, by composing the two cases above, we see that the substitution (' e ( o )/' maps 
AAi to r i and agrees with tj 1 for variables in the intersection of their domains. 

Finally, for property (4), we need to check that i>i(Ai ) =>* V* i{Bi ) over the same path 
(the first i — 1 steps of 7r). From the facts that A{ =>* Bi over this path, AAi □ Ti, and ifti 
introduces only fresh variables, this verification is straightforward but tedious, so we omit 
the details. 

Next, we argue that B»+i 3 T *+i- Recall that B»+i = 0»(BBj[u\ *— /?i]), where 9, is the 
mgu of BBi[u>i] and a<. By assumption, no variables are common to both a< and BBi, so 
dom(^j) = V(BBi[<jJi\) U V(ctj). We petition the variables in BBi into those in dom(0i) and 
those not in the domain. We construct a substitution j^+i such that ty+i(-^+i) = t»+i. Since 
BBi[<j)i] □ and a, □ 7\[<*\], it follows that 


BBi[ui ] n«i3 r.M. 


But 

9i(BBi[u>i\) = BBi[u)i ] n cti. 

Thus there is a substitution <f> such that <f> o 9i = rj 1 (see Figure 5). Thus if v is a variable in 
dom(0j) H V(BBi), <f> o 9i(v) is the ground term that occurs at the same location(s) in Ti[u>;] 
as v does in BB 

If v is not in dom(#j) but occurs in BBi, then from the above discussion we know that 
is the ground term that occurs at the same location(s) in r, as v does in BB,. Since 
these two sets of variables are disjoint, rj' o <j> o 9i(BBi ) = Tj. 

Now, 


T.-+1 = Ti[<j\ <j> O 9i{(3i)\ 

= tj' o <f>o 9i(BBi[ui <- <f>o9i(f3i)] 

= T)' o<f>o 9i(BBi[<jJi <- /?i]) 

C 9i{BBi[u>i *- Pi] 

— Bi+ 1 . 

To argue that vij+i □ t x , we proceed similarly. Partition V(./4 j 4;) into variables in dom(0,) 
and other variables. It is the former that are significant, so let v be in dom(^) fl V(Aj 4,). 
From the discussion above, we know that C'(v) — v'( v ) = a ground term in T\ (and also in 
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Ti). From Fig. 5, 0<(v) □ i/'(v), so 0<(v) □ CM- Hence <' o $ t = C (recalling that C is a 
ground substitution), and 

T\ = C(AAi) 

= c ° 0i(AAi) 

= revo- 


lt is routine to show that that variables common to Ai+i and Bi+i are mapped to the same 
ground terms in r x and r, +1 , respectively, and that A, +1 =>* Bi+i over the path consisting 
of the first i steps of the path in the example. Then the induction is complete, and with it, 
the proof. I 



Figure 5: Substitutions used in the proof of Theorem 39. 


Example 40 [LP] To illustrate AL-1 in a familiar setting, let us see how the AL-1 algorithm 
generalizes the example pius(s(s(0)),0,s(«(0))) using the logic program for plus. The proof 
of the example has three steps, shown in the first column of the table (the first line is the 
initial state). The numbers in parentheses are the rewrite rules (clauses) used in each of 
the steps. To avoid variable conflicts, a variant of each rule using fresh variables has been 
applied in each step. 

The second and third colu mn s show the values of the variables ^4* and after each 
step. The substitutions are shown in t he f ou rth column. The calls to Stretch are all 
“no-ops” in this example since the rewrite positions u>i in the computation are all e. Thus 
the substitutions V\ are all identity functions. Finally, if we apply all the substitutions to 
the initial value Xj, we obtain, as output, the rule p/us(.s(.s(0),x 3 ,s(s(x 3 ))) true. I 


Ti rule 

Ai 

Bi 

0i 

plus(s(3(0)),0,a(5(0))) (H) 

9 1 

9 1 


pMs(O), 0,s(0)) (U) 

p/us(s(X2).*3,«(*4)) 

plus(z2, Z 3 , Z 4 ) 

gi := plus(s(z 2 ),z 3 , ^(* 4 )) 

plus( 0,0,0) (i) 

plus(s(s(zs)), X 3 , j(j(*6))) 

p/us(*5,X 3 ,* 6 ) 

x 2 := s(x 5 ),Z 4 := s(z 6 ) 

true 

p/us(s(j(0)), z 3 , s(s(x 3 ))) 

true 

z 5 := 0, *6 := *3 


Example 41 [AP] We apply AL-1 to the plus program in Example 29 and to the computa- 
tion in Example 33. The steps are summarized in the following table, and the resulting rule 
is: ((plus (s 0)) x 3 ) = (s x 3 ). I 
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Ti 

Ai 

Bi 

er 1 

(( plus (s 0)) (s 0)) 

*1 

*1 


(s ((plus 0) (3 0))) 

((plus (s x 2 )) x 3 ) 

(s ((plus x 2 ) * 3 )) 

*1 : = ((pins ( 5 * 2 )) * 3 ) 

(s (s 0)) 

((plus (s 0)) x 3 ) 

(sx 3 ) 

X2 := 0 


Example 42 [LC] Let us apply AL-1 to the computation sketched in Example 34: ((plus [s, 0]) 
[s,0]) =£•* [s, [s,0]]. By analogy with the preceding example, we expect the result to be 

(((plus [i,0]) *i), [*,*i]) 

(or a variant thereof), and, indeed, this is the outcome. But because the path for the 
complete computation is so long, we shall follow only the first few steps. 

The input configuration Tj is ((plus [s, 0j) [s , 0]), and the path is sketched in Example 34. 
We initially take our configuration to be xj (an expression variable). 

The first step, a replacement for plus, occurs at a location not in x 2 ; so B\ = x\ is first 
stretched into BB X - ((plus x 2 ) x 3 ). To apply the plus rule, we unify plus with itself (so is 
an identify), and replace plus by its corresponding lambda expression (with fresh variables), 
to obtain 

B 2 = ((Avj . Au 2 . (((zero?Vi) v 2 ) (succ ((plus (second v\)) v 2 ))) *2) *3) 
and j4 2 = ((plus x 2 ) *3). 

The next several steps are /3-reductions at locations already in B 2 , so that stretching has 
no effect. Consider the first of these, the ^-substitution of x 2 for the parameter v 2 . The 
maximal rewrite rule applicable here is the /3-reduction rule: 

((Au 9 i . Av 92 . (((*101 t/ 9 l)v 92 ) (*102 ((*103 (*104 ^Sl)) v «))) *105) =^ 

Av 92 . (((*101 *105) « 92 ) (*102 ((*103 (*104 *10s)) v 92)))- 

The result of applying this rule to B 2 is 

B 3 = (Xv 2 .(((zero? x 2 )v 2 )(succ ((plus (second x 2 ))v 2 )))x 3 ). 

The cumulative result of just these two steps is the rule: 

((plus x 2 ) x 3 ) =► B 3 . 

The remaining steps are similar to the first two, with the final rule being: 

((plus [a, 0]) x 3 ) =» [«,X 3 ]. I 

Finally, a simple but useful observation is that A\ 3 A 2 □ □ -An+i, that is, the 

lefthand side a of the final rule (a,/3) becomes monotonically less general as the length of 
the path increases. Since a contains the “pre-conditions” that must be satisfied before the 
new rule is applicable, it follows that the rule becomes less general as the length of the 
path — and in some sense, the amount of information in the computation — becomes larger. 
With AL-1, it seems that we learn less and less from more and more. One way to avoid this 
problem is discussed later when we consider deterministic rewriting systems. 
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A Counterexample: Elementary Formal Systems 

Elementary formal systems (EFS’s) [1, 2, 31] are a form of logic programming useful in 
learning formal languages. Briefly, an EFS is a Horn-clause logic programming language 
with string concatenation as the only function symbol. For example, let £ = {a, b, c} be an 
alphabet, and let *i, x 2 , . . . be variables taking values in £ + . The clause 

p(axibabx 2 ) true (t) 

says that any string of the form axibabx 3 (with nonempty strings substituted for the i<’s) 
has property p. Similarly, the clause 

p(cx 3 x 4 ) : - p(x 3 bx 4 ) (it) 

states that an instance of the string pattern cx 3 x 4 has property p if the instance of x 3 bx 4 
with the same substitution for x 3 and x 4 also has property p. Combining these two clauses, 
we can prove p(caababba), for example, by matching x 3 = aaba and x 4 = bba in the second 
clause, reducing the goal to p(aababbba), and then matching *1 = a and x 2 = bba in the first 
clause. The computation is as follows: 

\p(caababba), e, (i)] =>■ \p(aababbba),e,(ii)] =► [true,*,*]. 

EFS’s diverge from our definition of nondeterministic TRS’s in an interesting way: be- 
cause string concatenation obeys an associative equational theory, there may be more than 
one way to unify two terms. For example, in matching p(caababba ) to p(cx 3 x 4 ) above, we 
could also have taken x 3 = aa and x 4 = babba. Doing so leads to a different proof of 
p(caababba), over the same path. The computation is as follows: 

\p(caababba), e, (i)] =>• \p(aabbabba), e, (it)] => [true,*,*]. 

For each of these two proofs we can apply the AL-1 algorithm, and as a result we derive 
two different rewriting rules: p(cax s bax 6 ) : - true in the first case, and p(cax 3 abx 6 ) : - true 
in the second. Both arejvalid clauses in this theory, and yet neither is a variant of the other. 
We conclude that Theorem. 39 does not hold for EFS’s. 

What property of EFS’s keeps them from qualifying as typed-term rewriting systems? 
Clearly, the existence of multiple mgu ' s for a pair of terms is part of the problem, but 
the Robinsonian nature of mgu’s is an inferred, not a defined, property of our typed-term 
languages. The actual reason is that, to admit associativity in string concatenation, the ttg 
for the language needs to be ambiguous. For example, to match the pattern xix 2 to the 
string aba , we must be able to parse aba as both a • (6 • a) and (a • b) • a. 

This counterexample is interesting because it shows that the standard EBG algorithm as 
it is currently used in Machine Learning does not work as expected for at least one impor- 
tant family of computational languages. Two ways come to mind whereby we could bring 
EFS’s within the scope of our nondeterministic term- rewriting systems. One is to adjoin 
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explicit rewrite rules for the associative theory — { xy • (x 2 • * 3 )* ( x i * * 2 ) * * 3 ), for example. The 
computational path would thereby record explicitly the steps taken to associate strings in a 
particular way for matching, and thereby implicitly record the unifiers used in the compu- 
tation. The other way is to extend the definition of a computation so as to record explicitly 
the particular unifiers used in matching rules to examples; a computation would thus be a 
sequence of 4-tuples instead of 4-tuples. The AL-1 algorithm then remains essentially the 
same, except that the particular mgu's computed in steps 2.2 and 2.4 would be determined 
by those used in the example. Theorem 39 also holds, with the understanding that the word 
“path” includes the mgu' s along with the rules and positions. 

The AL-2 Algorithm 

The formalism we have developed based on ttg’s and TRS’s has been useful for extending 
one algorithm to a large family of programming languages. Our belief, however, is that the - 
formalism is useful in general for studying analytical learning algorithms. As evidence, we 
shall use the same framework to develop another analytical learning algorithm, AL-2. Like 
AL-1, the new algorithm learns from success, preserves the correctness of the program, and 
outputs new rules for potential inclusion in the knowledge base. Unlike AL-1, however, 
AL-2 modifies the language in which the rules are expressed. New constants and/or function 
symbols may be introduced in order to abbreviate frequently occurring terms and to shorten 
common sub-proofs. This resembles what humans do, for example, when we say “EBG” 
instead of “explanation- based generalization” or “prime” instead of “natural number divisible 
only by itself and one”. Moreover, AL-1 and AL-2 can be used together or independently. 

Example 43 [LP] Consider again the program for addition in Example 28. After using this 
program for addition many times, an observer might determine that the term s(s(0)) occurs 
sufficiently often that significant savings might well be sustained by shortening this term to 
just a single character, say “2”. As a result, every time this term occurs in a computation, 
instead of writing seven characters, only one need be written. The symbol “2” is not now 
in the language, so the grammar must be modified in a straightforward way to generate this 
additional constant. Some goals, such as plus(0, 2,s(s(0))), may then be correctly handled 
without any modifications to the program. Other goals, such as plus( 2, 0, 2), fail because the 
program is not designed to handle such terms. I 

Example 44 [LC] In Example 30 we introduced several abbreviations in order to shorten 
the expressions we were working with. For example, we wrote 0 in place of Xvy.vy and false 
in place of Xvy.Xv 2 .V 2 . We treated these as abbreviations for human consumption only, but 
it is reasonable to consider how to incorporate these into the program so that, for example, 
((p/usO) 0) 0. If one tried to carry out this computation as things stand, the subterm, 

(0 (Av2.Av3.u2)) 


would soon occur. This can be rewritten only after replacing 0 by the equivalent lambda 
expression. I 
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The AL-2 algorithm takes a computation and an abbreviation, and returns a list of new 
rewrite rules which, now that they are available, enable us to do the computation— and 
some related generalizations of it — after substituting the abbreviated term into the initial 
configuration. Like AL-1, the input to the learning process is a successful computation, and 
the result of learning is a set of new rewrite rules. 

Before describing the AL-2 algorithm in detail, let us consider simple examples of how it 

works. 

Example 45 [LP] Suppose that we decide to abbreviate the term s(s(0)) by the new symbol 
“2”. How should we modify the logic program to utilize this abbreviation? There is, of 
course, a trivial way to incorporate the abbreviation: provide a preprocessor that changes 
all occurrences of 2 in the input into s(s(0)), use the program as is, and then replace all 
s(s(0)) terms in the output by 2. But this clearly saves neither time nor space in the 
computation. 

Here is a better way. When we try to satisfy the goal plus( 2,0,x), using the program 
above, we run into trouble because 2 fails to unify with s(x 2 ) i® the second clause. Since 2 
is just an abbreviation for s(s(0)), the clause 

plus{ 2, x 3 , «(au)) : — plus(s( 0), * 3 , * 4 ) 

is clearly valid We obtain it by instantiating 12 to s(0) andi replacing the resulting occurrence 
of s(s(0)) on the left by 2. With the addition of this clause to the program, the family of 
goals plus( 2, *!, s(x 2 )) can be solved in terms of the new symbol 2. I 

Example 46 [LC] Suppose we wish to perform the computation (zero? 0) without replacing 
0 by Awj.wi. We have no trouble with the first step — replacing zero? by its definition, 

(zero? 0) =>■ (Ai> 2 -(t >2 true) 0), 

— nor with the next, a /3-reduction leading to (0 true). At this point, however, we are stuck. 
If we check how this step is done when 0 is written Av 3 .t> 3 , we see that the rule being applied 
is {(At> 3 .v 3 t> 4 ),V 4 ). After introducing t he abbreviation i nto this rule, we see that the only 
new rule we need in order to complete the computation is 

((0 v 4 ),t> 4 ). 

The computation then concludes successfully, with a result of true. I 

Definition 47 Let B be a type in a typed-term grammar. A synonym of type B is a pair 
(<r,r), where o is a symbol not in the grammar (terminal or otherwise) and r € C(B) is 
a ground term of type B. We shall often refer to a as simply a synonym for r, and write 
a ~b t. Normally, the type of the synonym is clear, and we omit the subscript B. 

Example 48 [LP] In the preceding LP example, “2” is a synonym of type Term for «(a(0)). 
It is equally a synonym of type Terml , but in general it is preferable to consider it an element 
of the most general type possible in the grammar. I 
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The AL-2 algorithm outputs rewrite rules in which the new symbol <x occurs; consequently 
the grammar Q must be extended to a new ttg Q' in such a way that both a and r are in 
C(B). One way to accomplish this is as follows; 

1. If B has arity one, add the production B — ► (T to the grammar. 

2. Otherwise, make B have arity one by replacing all productions B — ► . . . by B' — * . . 
where B' is a new non-terminal, and then add the productions B — ► a and B — * B'. 

We assume henceforth that Q has been so modified. 

Definition 49 Let (J rsj 7 * be a synonym of type B and let £ be a term in C{B ' ) for some 
non-terminal B'. The a -abbreviation of C, written Cb, is obtained from C by replacing all 
type- 5 occurrences in C of the subterm r by <r. 

Lemma 50 For every non-terminal B ', C(B') is closed under ^--abbreviations. I 

Just as Expand is the operation upon which AL-1 is based, Reunify forms the basis 
for AL-2. Input to Reunify consists of a synonym a ~b t of some fixed type B, a term 
C, and a ground instance Co of C- The output is a term C such that C 3 C 3 Co and 
CU 3 Cob- Intuitively, Reunify finds an instance of C that is as general as Co both 
with and without the abbreviation <r. For example, when ( is plus(s(x),y, s(z)) and Co 
is pfus(s(s(0)), s(s(0)), s(a(a(s(0))))), then Cob is not an instance of Cb- However, C = 
plus(s(s(0)),y,s(z)) is an instance of C such that (' □ Co and C'b 3 Cob- 
The algorithm for Reunify is shown in Figure 6. 

Lemma 51 In the notation in Figure 6, the output of Reunify is a least upper bound of all 
terms £' such that C 3 C 3 Co and C'b 3 Co|«r- 

PROOF: The proof is by induction on », the number of locations u> such that (CoU)M — a 
and (CU)M 2 CobM- Suppose n = 0. Then (’ = ( and C = C 3 Co- We claim that, 
with n = 0, there is no location u>' such that (CU)M 2 (CoU)M; it then follows that 
CU 3 Cob- Suppose such a location u/ exists. Since C 3 Co, there exists a substitution 6 
such that 0(C) = Co, and hence (0(C)) Mb = CoMb- % assumption, however, (0(Cb))M 
(Cob ) [ <x,/ ] - 1 11 other words, substituting the abbreviation a for r in Co has the effect that 
0 is no longer a unifier for the subterms at position u/. The term (Cb)[M, therefore, must 
contain at least one variable x such that 0{x ) is a subterm of t. Let w” be the location of 
this occurrence of r in Co- Then both (Cb)[ u,W ] 2 (Cob) and (Co |<r ) = <r - But then 
n > 0, contrary to our assumption. Thus no such u/ exists, and C satisfies the lemma. 

Next, assume that the lemma holds for all terms for which n ^ k, and suppose C i s a 
term such that C 3 Co and 71 = k 4- 1. Let 10 be the location chosen in step 2.1 and 0, the 
substitution in step 2.2. Since r is ground, 0 is unique. Clearly C 3 0(C), and 0(C) 3 Co- 
When 0 is applied to C> n decreases by at least one. Hence the term 0(C) satisfies the 
inductive hypothesis, and the result of subsequent iterations of the algorithms is a term C 
such that C 3 0(C) 3 C 3 Co and C'b 3 Cob, accordance with the claim in the lemma. 
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Algorithm 4 (Reunify) 

Input : A synonym o r . 

Terms C and Co, with C 3 Co and Co ground. 

Output : A term (' such that C 3 C 3 Co and C'U 3 CoU- 
Procedure-. 

1. Initialize: C' : ~C- 

2. While C'U 3 CoU, do: 

2.1 Let u> be a location such that (C1<r)M 2 (CoU)M and (CoU)M = cr € C(B). 

2.2 Let 0 = m^u(C'[w],r). 

2.3 C':=0(C')- 

3. Output C'. 

Figure 6: The Reunify algorithm. 

To argue that C' maximally general, suppose that another term C" 3 C can be found 
such that C 3 C" 3 Co and C"U 3 CoU- Note that the substitution 0 such that 0(C) = C is 
a ground substitution (it substitutes ground terms for some of the variables of C)- Thus if 
C ; 2 C", then there exists a variable x in dom(0) not in dom(77i^u(C, C w )) - But we have seen 
that every variable in dom(0) occurs within a subterm CM of C such that (Cl<r)M 2 (CoU)M- 
Thus (C"U)M 2 (Co|<r)M, a contradiction. Thus the only possibility is that C" = C ■ ■ 

Example 52 [LP] Suppose we invoke Reunify with the synonym 2 ~ s(s(0)), and use it 
to specialize C = pluses (z) . y. so that the resulting term subsumes Co |<r = plus( 2,2, 

s(s(2))) after introducing the abbreviation. Initially (' = plus(s(x),y,s(z)). Thus C'U = C- 
The subterm s(x) occurs at a location in CI 2 that does not unify with the corresponding 
term (“2”) in Cola- So we substitute z := s(0) and replace C' with p/us(s(s(0)), y, «(*)). Now 
CI 2 3 C 0 I 2 , and the Reunify procedure terminates. I 

The AL-2 algorithm takes a synonym <r ~ t of some type B and a computation of length 
n > 1, and produces a set Q. of new rules incorporating the synonym. The intention is that, 
by combining the rules in Q and 5?, one can subsequently perform calculations over the same 
path using the extended language, except that some of the rules in the path may be 

replaced by their counterparts from Q. 

The algorithm (Figure 7) is quite simple. The left-hand side of each rule along the path 
is reunified with the corresponding subterm of the configuration, using the synonym < 7 . The 
resulting substitution is applied to both sides of the rule; after introducing a into the result, 
this pair is added to the set Q as a new rule. 
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Algorithm 5 (AL-8J_ 

Input : A synonym a ~ r of type B. 

A ground computation [ri,u>i, (ai,/?i)], . . . , [r n ,u; n , (a n ,/3 n )]> [ r n+i 5 *> *] 
of length n. 

Output: A set Q (perhaps empty) of rewrite rules. 

Procedure: 

1. Initialize: Q = 0. 

2. For i := 1, . . . ,n: 

2.1 aj := Reunify^o ~ r, a,, T,-[u>i]). 

2.2 If a( ai, then 

2.21 0; := m<7u(a(,a;). 

2.22 Q := Q U {(aj|«„#|»)}, where ft = 0;(/?<)- 

3. Output Q. 

Figure 7: The AL-2 algorithm. 

As in our presentation of the AL-1 algorithm, the algorithm in Figure 7 uses more vari- 
ables than necessary in order to simplify the proof of the next theorem. The following lemma, 
whose proof is routine, is also used: 

Lemma 53 Let (i and & be configurations with a common location u> such that 

1. Ci 2 Ca5 

2. both (i and can be rewritten at location <*> using the rewrite rule (a, (3). 

Then there are substitutions 0\ and 6 2 such that 0i(a) = CiM> ^2( a ) = C 2 M 1 ^1 (Ci 
(3]) □ 6 2 {( 2 [w <- /?]), i.e., the ordering relation still holds after rewriting. I 

Theorem 54 Refer to the notation of Figure 7, in which r 2 is the initial ground configuration 
of the computation. Let S' = S U Q, the set of rewrite rules obtained by combining the 
original rules 3? with the set Q returned by the AL-2 algorithm. Let £1 be any configuration 
such that □ Ti\„ and (i ^ ( n +i over the same path x as the input computation. 

Then =>* Cn+iU over a path of length n, using the rules in S'. In particular, r^ =S»* 
TV,+i| ff , so that the original example can be recomputed with the abbreviation substituted 
into the original configuration. 
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PROOF: Since the path ir applies to £ 1 , c*i 3 Kit happens that a x □ (CiUH^iL then 

a ' = ai. Otherwise a x is the result of reunification in step 2.1. By Lemma 51, 

3 a '\ 3 Cil^i] 3 T i[ w i] 


and 

o', I, 3 (CiU)W 3 W.)M- 

Let , P[\tr) be the rule added to Q in step 2.22 if a[ ^ a x , or (ai,/?i) otherwise. By 
Lemma 53, if we use this rule to rewrite Ci|«r to C2U and to rewrite t x \„ to t 2 \ a at location 
wj, we have that Ca|<r 3 t 2 \ 9 . By assumption, C2 =** Cn+i over the last n - 1 steps of the 
path 7 r. Thus we can repeat this argument n — 1 times more, obtaining a path 


over which CiU =>* Cn+iU- I 

Example 55 [LP] Consider the 3- step computation 

pks(a(s(0)),«(0),s(s(s(0)))) =► p/us(a(0),s(0),s(s(0))) =S> pfos(0, .*(0), s(0)) =* true. 

With the synonym 2 ~ s(s(0)), the initial goal is plus(2, j( 0),«(2)). For the first step of the 
computation, AL-1 adds the rule 

pius(2, ® 3 , «(2)) : — plus{s( 0), ® 3 > 2) (2) 

to Q. For the second step, the rule 

p/us(a(x 4 ),Z5,*(2)) : — pjus{ x t , a: 5 , a(0)) (3) 

is added. No new rule is required for the final step. Thus Q consists of the two rules, (2) 
and (3). I 

In the preceding example, suppose that we subsequently apply the AL-1 algorithm to the 
computation using the new rules starting from the goal plus(2, s(0), «(2)). The result is the 
rule: 

plus(2, s(0), s(2)) .-true. 

This be used to supplant rule (2), since for this particular program, y := s(0) is the only 
valid instance of x 3 . Similarly, applying AL-1 to the second goal plus(s(Q), s(0),2) in the 
path yields the rule 

pfus(s(0),«(0), 2) true. 

That the the condition Ci U 3 r x \ v in Theorem 54 is necessary is illustrated by the next 
example. 
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Example 56 [LP] With the three-step computation starting from p/us(s(s(0)), 0, s(a(0))) =>* 
true and the synonym 2 ~ s(a(0)), AL-2 constructs the one rule, 

plus( 2, *i, 2) : — plus(s( 0), xi, s(0)). (4) 

With this rule, the goal p/us(2,0,x 2 ) is computable over the same path, but plus(s(x 3 ), 0, 2) 
is not. (Note that plus(s(x 3 ),0,2) 2 2,0,2).) I 

In the two preceding examples, the rules found by AL-2 each have only a single valid in- 
stance, and hence are not very interesting. In the next two examples, however, the algorithm 
produces stronger generalizations. 

Example 57 [LC] If we apply the AL-2 algorithm to the computation 

plus [a, [a, 0]] 0 =>* [a, [s, 0]], 

using the lamb da- calculus program of Example 30 and the synonym 2 ~ [a, [a, 0]], the AL-2 
algorithm fails to find any need to reunify until reaching the following configuration: 

(( (2 (Au2.Au3.u2)) 0) J), 

where J = (succ((plus(second[s, [a, 0]])) 0)). The /3-reduction rule that is used in the (un- 
abbreviated) ground computation is: 

(At>4.((t>4 ®i) Xi)x 3 ) =£■ ((*3 z l) * 2 )- 

When we reunify the underlined subterm and the synonym for 2 with the left-hand side of 
this rule, and then apply the resulting substitution to the right-hand side of the rule, we 
obtain the following new rule: 

(2ij)=s-((i}<)[< ) fl]). (5) 

Continuing the computation, we again encounter a reduction of the form 2 (Au2.Av3.u2), but 
the same new rule (5) is derived. In the end, the set Q consists of just the rule (5). 

It is interesting to note that this rule is neither a /3-reduction rule nor a name-replacement 
rule, but instead a combinator applying the constant u 2' n to an arbitrary argument in fact, 
a complete and correct definition of the meaning of this symbol in the enlarged language. I 

Example 58 [AP] When we apply the AL-2 algorithm to the computation ((plus 2) 0) =i>* 2, 
using the program of Example 29 and the synonym 2 ~ (succ 0), the result is the new rule, 

((plus 2) xi) = (s ((plus (a 0)) x a )). 

When AL-1 is also applied to the example computation, the result is the more concise rule, 
((plus 2) ii) = (s (s xx)). I 
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Comparing the three preceding examples, we note that, whereas the three computations 
are semantically the same (2 + 0 = 2), the rules found by AL-2 on behalf of the abbreviation 
2 are entirely different, in both form and generality. This stands in marked contrast to the 
AL-1 algorithm, where the semantics of the learned rules were, in some sense, independent 
of the syntax of the TRS. It is not hard to see why: an abbreviation is, after all, a syntactic 
modification, and the operational role of a symbol, such as “2”, is expected to vary with the 
TRS. Thus 2 is a combinator (like plus) in LC, a function in AP, amd a constant term in 

LP. , . 

Finally, let us note that a more general algorithm than AL-2 can be devised that in- 
troduces more complex abbreviations than just constants. For example, if we wanted to 
introduce the formal abbreviation “[ii,x 2 ]” for “Av 1 .((t; x x 1 )x 2 )’\ we could not do it using 
the AL-2 procedure, because this abbreviation is not a constant. Extending the algorithm 
in this way entails modifying the language, including the ttg, in ways that are difficult to 
generalize over the full range of our NTTRS’s, but the fundamental concepts and procedures 
remain essentially the same. 

Deterministic Term-Rewriting Systems 

The NTTRS model has three main features that enable it to extend the EBG algorithm 
to other languages: 

• The ability to generalize and specialize while preserving types. 

• A general computational process (term rewriting) common to the programming lan- 
guages used for Artificial Intelligence. 

• Nondeterminism. 

As noted above, the use of a nondeterministic model is appropriate for algorithms that learn 
from success, because the non determinism assumption abstracts away all of the backtracking 
search that occurs in any actual, deterministic system. Also many programming systems that 
are closely related when viewed as nondeterministic look very different when implemented 
as deterministic languages. 

A deterministic rewriting system requires a recursive “choice” algorithm for selecting the 
next position to rewrite and the rule to apply. Whereas the “state of a nondeterministic 
computation is just the current configuration, the stat e of a d eterministic computation may 
depend upon the entire sequence of configurations since the beginning of the computation. 
The results of learning in a deterministic system may lead to changes in both the rewrite 

rules and the choice algorit hms. — ^ • 

The AL-1 and AL-2 algorithms propose changes only to the rules. The AL-1 algorithm 
proposes a single rule that compresses an n-step computation into a single rewriting step, 
and the AL-2 algorithm offers a set of rules that enable each step of the computation to be 
carried out in an enhanced language that has abbreviations for some of the ground terms. 
A real programming system can apply the AL-x algorithm(s), but will also need additional 
procedures for incorporating the learned rules into the deterministic process. 
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Questions of how best to incorporate the new rules into the program or to modify the 
choice functions are outside the scope of these algorithms. Often called utility problems , 
such questions require that we make explicit assumptions, such as what possible choice 
functions are available to the system, how problem instances are presented, what performance 
characteristics we want to optimize by learning, etc. Studies such as [5, 9, 20, 21] have looked 
at ways to measure the potential effects on performance of incorporating certain rules into 
a program, and ways to decide if and how to make such changes. Since no universal results 
have yet emerged, the problem is evidently quite difficult. But by separating the process of 
proposing new rules from questions of utility, our model may make it easier to formalize and 
reason about such matters. To be sure, the failure to separate these issues has complicated 
a number of previous presentations of Analytical Learning research results. 

Conclusions 

A Nondeterministic, Typed- Term Rewriting System is programming language schema 
that captures enough of the features of AI programming languages to enable us to cast 
the EBG learning algorithm in a very general form. In essence, this algorithm compresses a 
multi-step computation into one step and produces a single rewrite rule to carry out this step. 
An important consequence of this generalization is that EBG-like analytical learning can be 
applied to languages other than first-order predicate calculus. To show that the usefulness 
of these TRS’s extends beyond a single algorithm, another analytical learning algorithm 
(AL-2) has been derived and analyzed within the same TRS framework. This algorithm is 
similar to AL-1 in that it learns from a successful computation, preserves the semantics of 
the original program, and proposes new rules that streamline computations along the same 
path. It differs in deriving multiple rules from the computation and in helping to install new 
symbols, chosen by an outside element, that abbreviate certain frequently occurring terms. 

We have not discussed the computational complexity of our algorithms, but they are 
easily seen to require time and space polynomial in the length of the input computation. 
Algorithms for parsing a sentence in a context-free grammar and for computing a most- 
general unifier account for most of the running time. 

Since the focus of this work is theoretical and no extensive empirical tests of these al- 
gorithms have been carried out, the usefulness of these algorithms for Machine Learning 
remains to be investigated. Nevertheless, we would like to make a conjecture. In our scheme, 
a computation is any finite sequence of rewrites. In particular, there is no requirement that 
the final state in the sequence be a Church-Rosser normal form. Thus given a computation 
of length 5, we could apply AL-1 to the entire computation, or to only the first four steps, or 
to the first three, or the last three, etc. Each of these yields a new rule that may, potentially, 
be used to improve the program. Which sub-computation(s) should we give to AL-1 for 
analysis? This issue is fundamental to the concept of operational! ty that has been a focus 
of much discussion [16, 23, 25]. 

We have already remarked that when a path is extended, more restrictions apply, and 
the resulting rule from AL-1 is therefore less general. For this reason it seems reasonable 
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to recommend the following strategy: in any given computation, apply AL-1 to all sub- 
computations with a length of two steps. Why length two? Length one is too small: AL-1 
will never generalize. Lengths longer than two are compositions of two-step paths, so if a 
particular path of length k > 2 occurs sufficiently often, the single rule compressing that 
path will eventually be obtained, two steps at a time, by successive applications of AL-1. 
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APPENDIX: A Note on Sort Hierarchies and TTL’s 

In the discussion of the subsumption lattice of typed-term languages, we noted that ttl’s 
enjoy a Robinsonian unification property: any two unifiable terms have a unique mgu, apart 
from syntactic variants. Wed t her [32] has shown that unification in a many-sorted language 
with a sort hierarchy is Robinsonian iff the ordering is a forest. The purpose of this appendix 
is to clarify the relationship between unification in many-sorted languages and unification 
in ttl’s. We argue that a many-sorted language with a forest-ordered sort hierarchy can be 
generated by a ttg, but that the converse does not seem to be the case. 

We first need to define what constitutes a sub-sort (or sub-type) in a ttl. Only general 
types are considered here, since there is nothing corresponding to special types in Walther’s 
theory. The natural way is to define one type G 2 to be a sub- type of another type G 1 if 
the symbol G 2 can be generated starting from G 1 , i.e., G 1 — ►* G 2 . Then the variables x 1 
and x 2 (of types G 1 and G 2 , respectively) are unifiable in our sense by the (unique) mgu: 
6 = {x 1 := x 2 }. The sort hierarchy induced by a ttg need not be a forest, however, since the 
grammar may contain productions such as G 1 — * G 3 and G 2 — ► G 3 . With such a grammar, 
our unification algorithm would not admit unifying the terms x 1 and x 2 , since x 1 ^ £(G 2 ) 
and x 2 £(G 2 ). By contrast, in Walter’s formalism, the term x 3 (of sort G 3 ) would be an 
mgu. If, however, there were a type (say, G°) of which both G 1 and G 2 were sub-types, then 
x 1 and x 2 would become unifiable in the ttl framework, and (as in many-sorted languages) 
the mgu would not be unique: both x 1 and x 2 are mgu' s. Note, however, that introducing 
the type G° has rendered the grammar ambiguous; hence it is not a ttg. (Compare the 
earlier discussion about Elementary Formal Systems, where, again, ambiguity led to loss of 
the Robinsonian property for unifiers.) 

Turning this around, we can make a many-sorted language with a forest-structured sort 
hierarchy into a ttg. Consider, for example, a sort A with sub-sorts A 1 and A 7 and let B be a 
sort for which there are no sub-sorts. Suppose there is a fu n ction / that takes as arguments 
a pair of terms of sort A 1 and B and returns a term of sort A 7 . Assume, also, that there 
are constants a, a 1 , a 2 , and b of sort A, A 1 , A 2 , and B , respectively, and a countable set of 
variables associated with each sort. We can construct a ttg that generates the terms in this 
algebra by assigning a general type to each sort and using the following productions: 


A’ 

-* 

x), j > lj* € {1,2} 

A’' 

- 

a*, *€{1,2} 

A 
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A\ *€{1,2} 
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— ► 

Xj, j > 1 

A 

— ► 

a 

B 

— ► 
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IV 

t— » 
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— ► 
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A 2 

— ¥ 

F 

F 
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f{A\B) 
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Here we have introduced F as a special type and used comma, and parentheses as 
constants. For example, the term / (a 1 , yi) is in £.(A 2 ). This illustrates how to construct a 
ttg that generates the terms in a free many-sorted algebra with a forest hierarchy of types. 

By contrast, there does not seem to be an obvious way to represent the typed terms of a 
ttl as a simple many-sorted language, even if we impose a partially-ordered sort hierarchy. 



